Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Transactional client-server cache consistency: alternatives and performance

ACM Transactions on Database Systems, 1997
...Read more
Transactional Client-Server Cache Consistency: Alternatives and Performance MICHAEL J. FRANKLIN University of Maryland, College Park and MICHAEL J. CAREY IBM Almaden Research Center and MIRON LIVNY University of Wisconsin-Madison Client-server database systems based on a data shipping model can exploit client memory resources by caching copies of data items across transaction boundaries. Caching reduces the need to obtain data from servers or other sites on the network. In order to ensure that such caching does not result in the violation of transaction semantics, a transactional cache consistency maintenance algorithm is required. Many such algorithms have been proposed in the literature and, as all provide the same functionality, performance is a primary concern in choosing among them. In this article we present a taxonomy that describes the design space for transactional cache consistency maintenance algorithms and show how proposed algo- rithms relate to one another. We then investigate the performance of six of these algorithms, and use these results to examine the tradeoffs inherent in the design choices identified in the taxonomy. The results show that the interactions among dimensions of the design space can impact performance in many ways, and that classifications of algorithms as simply “pessimis- tic” or “optimistic” do not accurately characterize the similarities and differences among the many possible cache consistency algorithms. Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]: Distrib- uted Systems—Distributed Databases; C.4 [Computer Systems Organization]: Perfor- mance of Systems; D.4.8 [Operating Systems]: Performance; H.2.4 [Database Manage- ment]: Systems—Concurrency, Distributed Systems, Transaction Processing General Terms: Performance, Algorithms, Design This work was supported in part by the NSF grants IRI-9409575 and IRI-8657323, DARPA contract DAAB07-92-C-Q508, and by a research grant from IBM. Authors’ addresses: M. J. Franklin, Department of Computer Science, A. V. Williams Building, University of Maryland, College Park, MD 20742; M. J. Carey, IBM Almaden Research Center, 650 Harry Road, K55/B1, San Jose, CA 95120; Miron Livny, Computer Sciences Department, 1210 W. Dayton, UW-Madison, WI 53706. Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. © 1997 ACM 0362-5915/97/0900 –0315 $03.50 ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997, Pages 315–363.
Additional Key Words and Phrases: Cache coherency, cache consistency, client-server data- bases, object-oriented databases, performance analysis 1. INTRODUCTION 1.1 Client-Server Database System Architectures Advances in distributed computing and object-orientation have combined to bring about the development of a new class of database systems. These systems employ a client-server computing model to provide both respon- siveness to users and support for complex, shared data in a distributed environment. Current relational DBMS products are based on a query- shipping approach in which most query processing is performed at servers; clients are primarily used to manage the user interface. In contrast, object-oriented database systems (OODBMS), which were initially devel- oped to support computationally intensive applications such as computer aided design (CAD), typically support data-shipping, which allows data request processing (in addition to application processing) to be performed at the clients. With data-shipping, DBMS software running on the client machines determines which data items are needed to satisfy a given application request and obtains those items from the server if they can not be found locally. The advantages of data-shipping for object-based DBMSs are two-fold: First, data-shipping moves the data closer to the applications, thus acceler- ating navigation through persistent data structures via the programmatic interfaces of object-based DBMSs. Second, data-shipping offloads much of the DBMS function from the server to the client workstations, providing both performance and scalability improvements. The performance advan- tages of data-shipping for navigation-oriented workloads are highlighted by benchmarks such as the 001 (or “Sun”) engineering database benchmark [Cattell and Skeen 1992] and the more recent 007 benchmark [Carey et al. 1993]. As a result of these advantages, data-shipping is used in research prototypes such as ORION [Kim et al. 1990], Client-Server EXODUS [Franklin et al. 1992; Exodus Project Group 1993], SHORE [Carey et al. 1994], and THOR [Liskov et al. 1992], as well as in commercial products such as GemStone [Butterworth et al. 1991], O2 [O. Deux et al. 1991], ObjectStore [Lamb et al. 1991], Ontos [Ontos Inc. 1992], Objectivity [Objec- tivity Inc. 1991], and Versant [Versant Object Technology 1991]. While data-shipping can be beneficial in client-server object database systems, there is also a potential downside; data-shipping implementations are susceptible to network and/or server bottlenecks that can arise if a high volume of data is requested by clients. The key to avoiding these bottle- necks is to use local client storage resources for data caching. Client data caching enables clients to retain copies of data items that they have received from servers. In the presence of locality (i.e., the affinity of 316 M. J. Franklin et al. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997.
Transactional Client-Server Cache Consistency: Alternatives and Performance MICHAEL J. FRANKLIN University of Maryland, College Park and MICHAEL J. CAREY IBM Almaden Research Center and MIRON LIVNY University of Wisconsin-Madison Client-server database systems based on a data shipping model can exploit client memory resources by caching copies of data items across transaction boundaries. Caching reduces the need to obtain data from servers or other sites on the network. In order to ensure that such caching does not result in the violation of transaction semantics, a transactional cache consistency maintenance algorithm is required. Many such algorithms have been proposed in the literature and, as all provide the same functionality, performance is a primary concern in choosing among them. In this article we present a taxonomy that describes the design space for transactional cache consistency maintenance algorithms and show how proposed algorithms relate to one another. We then investigate the performance of six of these algorithms, and use these results to examine the tradeoffs inherent in the design choices identified in the taxonomy. The results show that the interactions among dimensions of the design space can impact performance in many ways, and that classifications of algorithms as simply “pessimistic” or “optimistic” do not accurately characterize the similarities and differences among the many possible cache consistency algorithms. Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]: Distributed Systems—Distributed Databases; C.4 [Computer Systems Organization]: Performance of Systems; D.4.8 [Operating Systems]: Performance; H.2.4 [Database Management]: Systems—Concurrency, Distributed Systems, Transaction Processing General Terms: Performance, Algorithms, Design This work was supported in part by the NSF grants IRI-9409575 and IRI-8657323, DARPA contract DAAB07-92-C-Q508, and by a research grant from IBM. Authors’ addresses: M. J. Franklin, Department of Computer Science, A. V. Williams Building, University of Maryland, College Park, MD 20742; M. J. Carey, IBM Almaden Research Center, 650 Harry Road, K55/B1, San Jose, CA 95120; Miron Livny, Computer Sciences Department, 1210 W. Dayton, UW-Madison, WI 53706. Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and / or a fee. © 1997 ACM 0362-5915/97/0900 –0315 $03.50 ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997, Pages 315–363. 316 • M. J. Franklin et al. Additional Key Words and Phrases: Cache coherency, cache consistency, client-server databases, object-oriented databases, performance analysis 1. INTRODUCTION 1.1 Client-Server Database System Architectures Advances in distributed computing and object-orientation have combined to bring about the development of a new class of database systems. These systems employ a client-server computing model to provide both responsiveness to users and support for complex, shared data in a distributed environment. Current relational DBMS products are based on a queryshipping approach in which most query processing is performed at servers; clients are primarily used to manage the user interface. In contrast, object-oriented database systems (OODBMS), which were initially developed to support computationally intensive applications such as computer aided design (CAD), typically support data-shipping, which allows data request processing (in addition to application processing) to be performed at the clients. With data-shipping, DBMS software running on the client machines determines which data items are needed to satisfy a given application request and obtains those items from the server if they can not be found locally. The advantages of data-shipping for object-based DBMSs are two-fold: First, data-shipping moves the data closer to the applications, thus accelerating navigation through persistent data structures via the programmatic interfaces of object-based DBMSs. Second, data-shipping offloads much of the DBMS function from the server to the client workstations, providing both performance and scalability improvements. The performance advantages of data-shipping for navigation-oriented workloads are highlighted by benchmarks such as the 001 (or “Sun”) engineering database benchmark [Cattell and Skeen 1992] and the more recent 007 benchmark [Carey et al. 1993]. As a result of these advantages, data-shipping is used in research prototypes such as ORION [Kim et al. 1990], Client-Server EXODUS [Franklin et al. 1992; Exodus Project Group 1993], SHORE [Carey et al. 1994], and THOR [Liskov et al. 1992], as well as in commercial products such as GemStone [Butterworth et al. 1991], O2 [O. Deux et al. 1991], ObjectStore [Lamb et al. 1991], Ontos [Ontos Inc. 1992], Objectivity [Objectivity Inc. 1991], and Versant [Versant Object Technology 1991]. While data-shipping can be beneficial in client-server object database systems, there is also a potential downside; data-shipping implementations are susceptible to network and/or server bottlenecks that can arise if a high volume of data is requested by clients. The key to avoiding these bottlenecks is to use local client storage resources for data caching. Client data caching enables clients to retain copies of data items that they have received from servers. In the presence of locality (i.e., the affinity of ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency • 317 applications at certain workstations for certain subsets of the data items), such caching can significantly reduce the volume of data that clients must request from servers. 1.2 Transactional Cache Consistency Client data caching is a dynamic form of data replication. As with any form of data replication, care must be taken to ensure that the presence of multiple copies in the distributed system does not jeopardize the correctness of programs. In a database system, correctness in the presence of concurrency, distribution, replication, and failures is tied to the concept of transactions. It is important, therefore, to distinguish between two types of caching: (1) intratransaction caching, which refers to the caching of data within transaction boundaries; and (2) intertransaction caching, which allows clients to retain locally cached data even across transaction boundaries. Intratransaction caching can be implemented by purging any cached items that are not protected by an active transaction and relying on the normal concurrency control mechanism (e.g., two-phase locking, etc.) to ensure the validity of the remaining cached data. In contrast, intertransaction data caching allows data items to remain in client caches even outside of transaction boundaries. Such cached items are not protected by the regular (transaction-oriented) concurrency control mechanism, so an additional cache consistency protocol is required to ensure correctness. Cache consistency protocols for client-server database systems have been the subject of much study in recent years and at least a dozen different algorithms have been proposed and studied in the literature [Wilkinson and Neimat 1990; Carey et al. 1991; Wang and Rowe 1991; Franklin and Carey 1992; Adya et al. 1995]. In terms of semantics, all of these proposed algorithms support the traditional notion of ACID transaction isolation; that is, they all ensure that transactions always see a serializable, or “degree three” [Gray and Reuter 1993] view of the database.1 The proposed algorithms, however, differ greatly in the details of their implementation; the developers of these algorithms have made widely varying choices along numerous design dimensions. Thus, despite the fact that all of these algorithms provide an identical level of protection, the performance of the various algorithms can be expected to differ significantly. While the papers mentioned previously have all included performance comparisons of several of the proposed algorithms, none of these studies have attempted to unify and explain the large design space for transactional client-server cache consistency algorithms. As a result, it has been difficult to compare and contrast the set of proposed algorithms and to choose among them. In this article we address this problem by proposing a taxonomy of transactional cache consistency maintenance algorithms that encompasses all of the algorithms that have been examined in the previ1 Because all of these algorithms are intended to support navigation-oriented data access, they do not explicitly address the “phantom” problem [Gray and Reuter 1993] that arises in associative, predicate-based access. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 318 • M. J. Franklin et al. ously mentioned studies. The taxonomy outlines the numerous dimensions of the design space for transactional cache consistency algorithms and shows how proposed algorithms relate to one another. After describing this taxonomy, we then use it to drive an analysis of performance results for three families of algorithms across a range of workloads. 1.3 Relationship to Other Caching Systems Caching has been used to reduce latency and improve scalability in many different computing environments, including multiprocessors [Archibald and Baer 1986; Stenstrom 1990; Lilja 1993; Adve and Gharachorloo 1995], distributed shared memory (DSM) systems [Li and Hudak 1989; Nitzberg and Lo 1991; Keleher et al. 1992], and distributed file systems [Howard et al. 1988; Nelson et al. 1988; Levy and Silberschatz 1990]. The maintenance of cache consistency has been addressed to varying degrees in all of these environments. While many of the underlying techniques used to enforce cache consistency are common across all of these environments, there are two aspects of modern object-oriented database systems that distinguish solutions in this environment from the others: (1) transactions, and (2) the client-server architecture. The transactional nature of caching in client-server database systems both imposes constraints and provides additional implementation opportunities for cache consistency maintenance protocols. The constraints include the requirement to provide the ACID semantics, which combine correctness criteria for concurrent execution of arbitrary programs by multiple users with guarantees for fault tolerance. Opportunities arise from the welldefined points of synchronization provided by the transaction model and the ability to piggyback cache consistency information on the protocols used to support concurrency control and recovery. The client-server OODBMS architecture impacts the design of cache consistency maintenance protocols by enforcing a specific partitioning of responsibilities between clients and servers. The relationship between transactional caching in client-server object database systems and data caching in other environments is addressed in further detail in Section 3. 1.4 Scope of the Article This article makes several contributions. One main contribution is the presentation of a taxonomy that provides a unified treatment of proposed cache-consistency algorithms for client-server object database systems. A key insight that arises from the taxonomy is the usefulness of classifying algorithms based on whether they detect or avoid access to stale cached data. This distinction is in contrast to the possibly more intuitive one between optimistic and pessimistic approaches, which does not accurately characterize the complex algorithms that have been proposed. While the taxonomy is complete in the sense that it encompasses the major algorithms that have been studied in the literature, it does not, of course, cover all conceivable algorithms. The body of work covered by this taxonomy ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency • 319 consists of proposed algorithms that provide serializable transaction execution; the taxonomy does not address the many possible relaxations of these semantics. Instead, because all of the algorithms provide exactly the same semantics to client-based transactions, the emphasis of the taxonomy is on issues that impact performance. After describing the taxonomy and placing the proposed algorithms in it, we then use the taxonomy to guide the analysis of the performance results that we have obtained in detailed simulation studies of a subset of the algorithms. This study includes the description of a suite of synthetic workloads that have been used to explore the performance of caching algorithms in a number of scenarios. Finally, the insight gained by analyzing the performance results in the context of the taxonomy is used to reflect upon the characteristics of the other proposed algorithms that have appeared in the literature. In this way, the taxonomy is used to shed light both on the nature of the design space for transactional cache consistency algorithms and on the performance tradeoffs implied by many of the choices that exist in the design space. The remainder of the article is organized as follows: Section 2 outlines a reference architecture for a data-shipping database system and describes the constraints on the algorithm design space that are implied by the architecture. Section 3 discusses cache consistency issues that have been addressed in the other computing environments. Section 4 presents our taxonomy of transactional cache consistency maintenance algorithms. Section 5 describes three families of transactional cache consistency maintenance algorithms in greater detail and examines their performance using a detailed simulation model. Section 6 comments on the tradeoffs made by the remaining algorithms in the taxonomy. Section 7 presents our conclusions. 2. CACHING IN CLIENT-SERVER DATABASE SYSTEMS 2.1 Reference Architecture Figure 1 shows a reference architecture for a data-shipping client-server DBMS. The underlying hardware is typical of that found in today’s computerassisted work environments (e.g., CAD, CAM, CASE, etc.). As shown in the figure, applications in a data-shipping DBMS execute at the client workstations. The DBMS consists of two types of processes that are distributed throughout the network. First, each client workstation runs a Client DBMS process, which is responsible for providing access to the database for the applications running at the local workstation. For protection reasons, the applications run in a separate address space from their local Client DBMS process, though some shared-memory may be used for efficiency.2 Applications send database access requests to their local Client DBMS process, which executes the request, in turn sending requests for transaction 2 The process boundaries described here are typical, but not universal. For example, in EXODUS, applications are linked into a single process with the client DBMS code. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 320 • M. J. Franklin et al. Fig. 1. Reference architecture for a data-shipping DBMS. support and for specific data items to the Server DBMS processes. Server DBMS processes are the actual owners of data, and are ultimately responsible for preserving the integrity of the data and enforcing transaction semantics. The Server DBMS processes manage the stable storage on which the permanent version of the database and the log reside. They also provide concurrency control and copy management functions for the data that they own. For simplicity, we assume that the database is statically partitioned across the servers. In general, data can be replicated among the servers in order to improve availability and/or performance; such replication, however, is beyond the scope of this article. In a data-shipping architecture, each Client DBMS process is responsible for translating local application requests into requests for specific database items and for brining those items into memory at the client. As a result, all of the data items referenced by an application are ultimately brought from the server(s) to the client. Some or all of these items may be cached at the clients in order to reduce transaction path length and server load. A Server DBMS process is responsible for providing the most recent committed values for the data items that it owns in response to client requests; of course, due to concurrency control conflicts, it may not be possible for the server to provide each requested item immediately. Data-shipping systems can be structured either as page servers, in which clients and servers interact using fixed-length, physical units of data such as pages (typically on the order of four or eight Kbytes) or object servers, which interact using logical, possibly variable-length, units of data such as tuples or objects. Aspects of the tradeoffs between page servers and object servers have been studied for the single-user case in DeWitt et al. [1990] and Kemper and Kossmann [1994], and for the multiuser case in Carey et al. [1994], Chu and Winslett [1994] and Adya et al. [1995]. For concreteness, this article focuses on page-server architectures in which data transACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency • 321 fer, concurrency control, and cache consistency are all performed at the granularity of a single page. The algorithms discussed here, however, are equally applicable to both page servers and object servers in that they assume that the same granularity (i.e., page or object) is used for concurrency control and cache consistency. The issues that arise when mixing levels of granularity for any of these functions are addressed in Carey et al. [1994] and Chu and Winslett [1994]. 2.2 Architectural Implications for Caching An important aspect of any data-shipping system is that all of the DBMS processes (both Client and Server) have storage resources (memory and/or disk) that they use for buffering data. The database system as a whole is responsible for managing buffers across the entire collection of machines. Similarly, the other distributed resources of the system, such as CPUs and disks, can also be exploited by the DBMS. In fact, in a typical environment with fairly high-powered client machines (e.g., “fat” clients such as state-ofthe-art PCs or workstations) the majority of the aggregate processing and storage resources available to the DBMS are likely to reside at the clients. As in other computing environments, client caching is a key technique for exploiting these resources. The design of a client caching mechanism for a data-shipping database system must respect the correctness and availability constraints of that environment. Workstation-server database systems must be capable of providing the same level of transaction support as more traditional database architectures, including serializability. Because client caching is essentially a form of data replication, correctness criteria for managing replicated data are applicable in this environment. The extension of serializability to replicated data is called one-copy serializability [Bernstein et al. 1987]. A one-copy serializable execution of transactions on a replicated database is equivalent to some serial execution of those transactions on a nonreplicated database. In terms of availability, a client caching mechanism must be designed in a way that the crash or disconnection of an individual client workstation does not impact the availability of data for applications running at other clients. Reducing the impact of client failures is crucial for several reasons. First, if individual client failures are allowed to inhibit availability, then the scalability of the system may be limited. Second, it is more cost-effective to fortify the (fewer) server machines (e.g., by adding duplexed log disk storage or nonvolatile memory) than to bullet-proof all of the client machines. Finally, since client machines are typically located on users’ desktops or in their homes or briefcases, they are typically less closely supported by an operations staff than servers. The combination of these correctness and availability concerns leads to the identification of two key properties of client data caching in a database environment: dynamic replication and second-class ownership. Dynamic replication means that page copies are created and destroyed based on the runtime demands of clients. When a client needs to access a ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 322 • M. J. Franklin et al. page, a copy of that page is placed in the client’s cache if one does not already exist. Page copies are removed from a client’s cache in order to make room for more recently requested ones or, under some caching algorithms, because they become invalid. This is in contrast to static replication, in which replication is determined as part of physical database design (e.g., Stonebraker [1979]). Second-class ownership refers to the fact that the cached copies of pages at clients are not considered to be the equals of the actual data pages, which are kept at the server. One problem with replication is that it can reduce data availability for updates in the presence of failures (e.g., network partition) in a distributed environment [Davidson et al. 1985]. Second-class ownership allows consistency to be preserved without sacrificing availability.3 Specifically, the server always retains locally any and all data that is necessary for ensuring transaction durability (e.g., data pages, logs, dirty page information, etc.), so that client-cached pages can be destroyed at any time without causing the loss of committed updates. This notion is crucial to data availability, as it allows the server to consider a client to be “crashed” at any time, and thus, to unilaterally abort any transactions active at that client. As a result, the system as a whole is never held hostage by an uncooperative or crashed client. Client caching as defined above provides the ability to exploit client resources for performance and scalability improvements. The protocol used to maintain cache consistency, however, imposes its own costs on the database system. Depending on the protocol used, these costs can include communication with the server, additional transaction aborts, and reduced efficiency for client cache usage. Given that a requirement for any such protocol is that it supports ACID transactions and respects the architectural constraints described above, the challenge is to design a protocol that incurs minimal overhead across a range of workloads. The remainder of this article investigates the design space of the transactional cache consistency maintenance algorithms for client-server database systems that have been proposed in the literature, and analyzes the performance of a number of these algorithms. 3. RELATED WORK As discussed in Section 1.3, cache consistency issues arise in many types of distributed and/or parallel systems including multiprocessors, distributed shared memory systems, and distributed file systems, as well as other database architectures, such as shared-disk database systems. While there are many similarities between the basic consistency maintenance tech3 The term “secondary-class ownership” is derived from a similar concept called “second-class replication” used in the CODA distributed file system [Kistler and Satyanarayanan 1991]. The two notions are similar in that a distinction is made between the “worth” of different types of copies. They differ, however, in that the second-class replicas of CODA are used to increase availability by allowing access to inconsistent data, whereas our notion of second-class copies is used to enhance the availability of consistent data. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency • 323 niques available in client-server database systems and in these other environments, there are significant differences in the systems that impact the design alternatives and their inherent tradeoffs. In the following sections we first describe how transactional cache consistency differs from cache consistency in nondatabase environments, and then describe how client-server database systems differ from shared-disk database systems. Important areas of difference for each of the systems include some or all of the following: (1) correctness criteria, (2) granularity of caching, (3) inherent cost trade-offs, and (4) workload characteristics. 3.1 Nondatabase Environments Much of the early work in cache consistency maintenance was done in the context of shared-memory multiprocessors. A number of early protocols for such systems were studied in Archibald and Baer [1986]; more recent surveys appear in Stenstrom [1990] and Lilja [1993]. The traditional notion of correctness in multiprocessor systems is sequential consistency [Lamport 1979], which aims to ensure that program execution on a multiprocessor machine with distributed memory (i.e., processor caches) provides the same results as an interleaved multithreaded execution on a uniprocessor machine with a single memory. Sequential consistency, therefore, deals with the ordering of individual memory accesses rather than compound units such as transactions. More recent work in the multiprocessor area has focused on developing alternative models that provide improved performance at the expense of requiring programmers (or compilers) to correctly place explicit synchronization primitives in their code [Adve and Gharachorloo 1995]. Many of these models, such as release consistency [Gharachorloo et al. 1990], allow programmers to combine multiple memory accesses into units that are protected by synchronization. For correctness, however, the composition of these units must be known and agreed upon a priori by all concurrent processes that may possibly access affected memory locations. Such a priori knowledge is a reasonable assumption in this environment, as the goal is typically to provide concurrent execution of a single, multithreaded program. Sharing of data across separate programs is not directly supported by these protocols. Also, all of these models have as their goal the provision of semantics that approach those of (volatile) memory in a uniprocessor, so they do not include support for fault tolerance. Database systems have a different set of requirements than those of the multiprocessor memory algorithms. ACID semantics provide correctness for concurrent execution of transactions containing arbitrary groups of operations and ensure correct execution, even in the presence of failures. This is important because database systems must support correct access to a shared database in the presence of a constantly changing workload. Database workloads are typically imposed by many different users who concurrently present a mix of ad hoc queries and updates to the system. As a result of these differences in focus, the basic techniques that have been ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 324 • M. J. Franklin et al. developed for database systems, such as two-phase locking or optimistic concurrency control, do not exist in the multiprocessor domain. As will be seen in Sections 4 and 5, the cache consistency algorithms that have been proposed for client-server database systems are based closely on these fundamental techniques. Nevertheless, despite the differences in emphasis, there are basic common issues that must be addressed by all cache consistency maintenance algorithms. For example, stale cached copies can be dealt with in two ways: by invalidating them or by propagating new values to the affected caches (i.e., write-invalidate and write-broadcast [Archibald and Baer 1986]). Also, consistency actions can be distributed through the use of a broadcast medium (e.g., snooping caches [Goodman 1983]) or can be maintained in directories (e.g., Agarawal et al. [1988]). Even at this level, however, the substantial architectural differences (e.g., message cost, cache consistency granularity, data access granularity, peer-to-peer versus client-server) between data-shipping database systems and shared-memory multiprocessors limit the extent to which results from one area can be applied in the other. Distributed Shared Memory (DSM) systems [Li and Hudak 1989; Nitzberg and Lo 1991] have cost tradeoffs that are closer to those in a data-shipping database environment. DSM systems provide the abstraction of a shared virtual memory address space that spans the nodes of a distributed system. Unlike multiprocessor caching, which can rely heavily on hardware support, DSMs are typically implemented in software with only minimal hardware assists. Because pages (or at least relatively large cache lines) are the unit of consistency, DSMs are similar to client-server databases with respect to granularity. In terms of cost trade-offs, DSMs are again closer to page servers than are shared-memory multiprocessors, because messages are required for consistency (although if the DSM is built on a multiprocessor, messages may be less expensive). The main differences, therefore, lie in the area of correctness criteria for DSM systems, which are typically the same as those for multiprocessors. Because of the high cost of messages in a distributed environment, recent improvements to DSM cache consistency protocols such as Lazy Release Consistency [Keleher et al. 1992] exploit communication patterns among processors to reduce the message overhead. Many of the protocols for data-shipping database systems also try to exploit existing communication patterns. In the database environment, however, these patterns are determined to a large extent by the protocols used between clients and servers to support ACID transactions (e.g., two-phase locking, write-ahead-logging, and optimistic concurrency control). Client caching has been used in distributed file systems since some of the earliest work in the area (e.g., DFS [Sturgis et al. 1980]). Many distributed file systems that support some form of client caching have been proposed and built. A survey of distributed file systems can be found in Levy and Silberschatz [1990]. As with data-shipping DBMSs, these systems use client caching to improve performance and scalability. However, they ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency • 325 support much less stringent notions of correctness in terms of both concurrency and failure semantics. Furthermore, distributed file systems are typically designed for workloads in which read-write sharing is rare (e.g., Baker et al. [1991]) and caching is often done at a fairly coarse granularity, such as entire files or large portions of files. Even so, the algorithms used in distributed file systems (e.g., Andrew [Howard et al. 1988] and Sprite [Nelson et al. 1988]) have served as the inspiration for at least one important class of cache consistency algorithms for client-server database systems (i.e., callback locking). 3.2 Shared-Disk Database Systems In addition to work outside of the database area, cache-consistency issues have also been addressed in multiprocessor database architectures other than client-server systems. Transactional cache consistency is required in any database system that supports dynamic caching. One such class of systems is shared-disk (or data sharing) parallel database systems, which consist of multiple nodes with private processors and memory that share a common disk pool [Bhide and Stonebraker 1988]. While similar in some respects to the client-server database systems addressed in this study, they differ in three significant ways. First, since nodes are not assigned to individual users, there is likely to be less locality of access at the individual nodes. Secondly, the cost of communication among nodes in a shared-disk environment is substantially lower than would be expected in the local area network of a page server DBMS. Thirdly, the structure of a shared disk system is peer-to-peer, as opposed to the client-server structure of a page server system, so many of the environmental considerations raised in Section 2 do not apply. A number of papers on shared-disk caching performance have been written by a group at IBM Yorktown. One of their earlier papers examined cache consistency protocols that were integrated with the global lock manager of a shared-disk system [Dias et al. 1987]. Later work has addressed the impact of data skew and contention for a range of possible algorithms [Dan et al. 1990; Dan and Yu 1991], the interaction between private and shared buffering [Dan et al. 1991] (similar to the interactions between client buffers and the server buffer), and extensions to callbackstyle algorithms [Dan and Yu 1992]. Other related work in this area includes the work of Mohan and Narang [1991], Rahm [1993], and Lomet [1994]. An algorithm that dynamically adjusts the granularity at which locking and coherency are managed for a shared-disk DBMS was introduced in Joshi [1991]. This approach was later extended for use in hybrid page server environments in Carey et al. [1994]. 4. A TAXONOMY OF ALGORITHMS In this section we provide a taxonomy of transactional cache consistency algorithms that encompasses the major algorithms that have appeared in the literature, including Wilkinson and Neimat [1990], Carey et al. [1991], ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 326 • M. J. Franklin et al. Fig. 2. Taxonomy of detection-based algorithms. Wang and Rowe [1991], Franklin and Carey [1992] and Adya et al. [1995]. Recall that all of these algorithms provide strict one-copy serializability and are applicable to page server DBMSs (although some were originally proposed for object servers). The taxonomy is shown in Figures 2 and 3. Branches of the taxonomy for which to the best of our knowledge no algorithms have been published are shown using dashed boxes in the figures. A key aspect of this taxonomy is the choice of Invalid Access Prevention as the main criterion for differentiating algorithms. As explained in the following, algorithms that use avoidance for invalid access prevention ensure that all cached data is valid, while those that use detection allow stale data to remain in client caches and ensure that transactions are allowed to commit only if it can be verified that they have not accessed such stale data. There are many possible ways to organize the design space for cache consistency algorithms, and at first, it might seem odd to use the avoidance/detection distinction as the most fundamental decision point in the taxonomy. A different, and possibly more intuitive approach is to divide the ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency Fig. 3. • 327 Taxonomy of avoidance-based algorithms. taxonomy along the lines of concurrency control and replicated data management, as has been done for algorithms in the shared disks environment [Rahm 1991]. Because the two concepts are so closely interrelated, however, dividing a taxonomy at the highest level along these lines can result in substantial duplication of mechanism within the taxonomy, hurting its descriptive effectiveness. Another possible approach would be to classify algorithms at the coarsest level as either “pessimistic” or “optimistic”. As will be seen in the following sections, such a binary classification is not meaningful for many algorithms; in fact, many of the algorithms that have been proposed use combinations of both pessimism and optimism that interact in complex ways. By using the invalid access prevention policy as the coarsest level of differentiation in the taxonomy, such hybrid algorithms can be easily accommodated. The next section describes the avoidance-based and detection-based approaches for invalid access prevention. Because this choice is a major distinction among algorithms, the properties upon which the lower levels of the taxonomy are based differ depending on which invalid access prevention policy is used. The lower levels of the taxonomy for each option are then elaborated in the sections that follow. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 328 • M. J. Franklin et al. 4.1 Invalid Access Prevention Transactional cache consistency maintenance algorithms must ensure that no transactions that access stale (i.e., out-of-date) data are allowed to commit. A data item is considered to be stale if its value is older than the item’s latest committed value.4 In this taxonomy we partition consistency maintenance algorithms into two classes according to whether their approach to preventing stale data access is detection-based or avoidancebased. Qualitatively, the difference between these two classes is that detection-based schemes are lazy, requiring transactions to check the validity of accessed data, while avoidance-based schemes are eager, they ensure that invalid data is quickly (and atomically) removed from client caches. Some avoidance-based schemes also retain certain access permissions across transaction boundaries in order to protect cached data that is not accessed by an active transaction. Detection-based schemes allow stale data copies to reside in a client’s cache for some period of time. Transactions must therefore check the validity of any cached page that they access before they can be allowed to commit. The server is responsible for maintaining information that will enable clients to perform this validity checking. Detection-based schemes are so named because access to stale data is explicitly checked for and detected. In contrast, under avoidance-based algorithms, transactions never have the opportunity to access stale data. Avoidance-based algorithms use a read-one/write-all (ROWA) approach to replica management. A ROWA protocol ensures that all existing copies of an updated item have the same value when an updating transaction commits. Avoidance-based algorithms can thus be said to avoid access to stale data by making such access impossible. In a ROWA scheme, a transaction is allowed to read any copy of a data item (which will typically be the one in its local client cache, if such a copy exists). Updates, however, must be reflected at all of the copies that are allowed to exist in the system beyond the updating transaction’s commit point.5 Before proceeding, it should be noted that detection-based algorithms can be augmented with techniques similar to those employed to enforce consistency in avoidance-based algorithms. In fact, three of the detection-based algorithms cited in Figure 2 use asynchronous update notifications (i.e., they asynchronously install new values or remove stale values at remote clients) in order to reduce the probability of having stale data in the client 4 Some concurrency control techniques based on multiple versions allow read-only transactions to access stale data provided that they see a consistent snapshot of the database [Bernstein et al. 1987]. While such algorithms are beyond the scope of this article, several of the consistency algorithms do allow multiple active transactions to simultaneously access different values of the same page, provided that strict (i.e., commit order) serializability is not violated. 5 As described in Section 2.2, the use of second-class replication allows the server to unilaterally eliminate any unreachable copies from the protocol so that transaction processing can continue. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency • 329 caches. A key point, however, is that these three algorithms lie strictly in the detection-based camp, as the notifications are sent only as “hints”. That is, despite the use of hints, these algorithms still allow stale cache contents to be present and potentially accessed by transactions and thus, must ultimately depend on detection to ensure that committing transactions have not accessed any stale data. In contrast, the use of avoidance obviates any possible need for detection, so there is no augmentation in the opposite direction within the taxonomy. 4.2 Detection-based Algorithms A number of detection-based algorithms (shown in Figure 2) have been proposed and studied in the literature. An advantage of the detection-based approach is simplicity. Because their consistency actions involve only a single client and the server, the detection-based approaches allow the cache management software on the clients to be greatly simplified compared to the ROWA approach. For example, using detection, the system software can be structured such that clients do not ever have to receive asynchronous messages from the server. The EXODUS storage manager [Exodus Project Group 1993] chose a detection-based approach largely for this reason. The disadvantage of detection-based approaches, however, is a greater dependency on the server, which can result in additional overhead. There are three levels of differentiation in the detection-based side of the taxonomy: validity check initiation, change notification hints, and remote update action. 4.2.1 Validity Check Initiation. The coarsest level of differentiation for the detection-based half of the taxonomy is based on the point (or points) during transaction execution at which the validity of accessed data is checked. The validity of any accessed data must be determined before a transaction can be allowed to commit; consistency checks for all data touched by a transaction must therefore begin and complete during the execution of the transaction. In the taxonomy, three classes of validity checking strategies are differentiated: —Synchronous, on each initial access to a page (cached or otherwise) by a transaction. —Asynchronous, with checking initiated on the initial access. —Deferred, until a transaction enters its commit processing phase. All three classes have the property that once the validity of a client’s copy of a data item is established, that copy is guaranteed to remain valid for the duration of the transaction. To implement this guarantee, the server must not allow other transactions to commit updates to such items until a transaction that has received a validity guarantee finishes (commits or ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 330 • M. J. Franklin et al. aborts). As a result, transactions must obtain permission from the server before they are allowed to commit an update to a data item.6 Synchronous validity checking is the simplest of the three classes. On the first access that a transaction makes to a cached data item, the client must check with the server to ensure that its copy of the item is valid. This is done in a synchronous manner—the transaction is not allowed to access the item until its validity has been verified. Once the validity of the client’s copy of the item has been established (which may involve the sending of a new, valid copy to the client), the copy is guaranteed to remain valid at least until the transaction completes. Asynchronous validity checking is similar, but the transaction does not wait for the result of the check. Rather, it proceeds to access (or write) the local copy under the assumption that the check will succeed. If this optimism turns out to be unfounded, then the transaction must abort. Finally, deferred validity checking is even more optimistic than asynchronous checking. No consistency actions pertaining to cached data are sent to the server until the transaction has completed its execution phase and has entered its commit phase. At this point, information on all the data items read and written by the transaction is sent to the server, and the server determines whether or not the transaction should be allowed to commit. These three classes provide a range from pessimistic (synchronous) to optimistic (deferred) techniques. Therefore, they represent different tradeoffs between checking overhead and possible transaction aborts. Deferring consistency actions can have two advantages. First, and most significantly, consistency actions can be bundled together in order to reduce and/or amortize consistency maintenance overhead. Secondly, the consistency maintenance work performed for a transaction that ultimately aborts is wasted; deferred consistency actions can avoid some of this work. There are also potential disadvantages to deferring consistency actions, however. The main disadvantage is that deferral can result in the late detection of data conflicts, which will cause the abort of one or more transactions. The asynchronous approach is a compromise; it aims to mitigate the cost of interaction with the server by removing it from the critical path of transaction execution, while at the same time lowering the abort rate and/or cost through the earlier discovery of conflicts. 4.2.2 Change Notification Hints. The emphasis on optimistic (i.e., asynchronous and deferred) techniques found in the literature on detectionbased algorithms is an artifact of the cost tradeoffs of the page server environment. Communication with the server is an inherently expensive operation, so designers of detection-based algorithms often use optimism to reduce this cost. Optimistic techniques are oriented towards environments in which conflicts are rare and the cost of detecting conflicts is high. While 6 Although it is not strictly necessary, all of the algorithms shown in Figure 2 use the same initiation method for update permission requests as they do for validity checks. If this were not the case, validation and update would require separate dimensions in the taxonomy. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency • 331 there is currently no definitive understanding of page server DBMS workloads, it is generally assumed that such workloads have lower levels of conflict than more traditional DBMS workloads, such as transaction processing [Cattell 1991]. In a transactional caching environment, however, the notion of conflict must take into account not only concurrent data sharing, but also sequential sharing. Sequential sharing arises when transactions that do not run concurrently access the same data. Because caching strives to retain data at a site even after a transaction has completed, the cache consistency maintenance algorithm must also deal effectively with this type of sharing. Recent studies of file system workloads [Ramakrishnan et al. 1992; Sandhu and Zhou 1992] indicate that sequential sharing may, in fact, be quite common in the types of situations in which page servers are intended to be used. Two approaches to reducing the potential for aborts in optimistic techniques have been proposed. One is to treat “hot spot” data differently, e.g., by switching to a more pessimistic protocol for such data (e.g., Adya et al. [1995]). The other is to use techniques from the avoidance-based (ROWA) algorithms to reduce the amount of stale data that resides in client caches. We call such techniques change notification hints. As can be seen in Figure 2, three of the algorithms found in the literature use some form of change notification hints (or simply, “notifications”). A notification is an action that is sent to a remote client as the result of an update (or an impending update) that may impact the validity of an item cached at that client. Purging or updating a stale copy removes the risk that a subsequent transaction will be forced to abort as a result of accessing it. Notifications can be sent asynchronously at any time during the execution of an updating transaction, or even after such a transaction commits. In fact, sending notifications before commit can be dangerous if the notifications actually update the remote copies rather than simply removing them; if the transaction on whose behalf the notification was sent eventually aborts, then the remote updates will have to be undone, adding significant complexity (e.g., cascading aborts) and expense to the algorithm. Early notifications that simply purge copies from remote caches are less problematic; still, they too can cause unnecessary aborts at remote sites if active transactions have already accessed the invalidated copies there. Because of these complexities, all three of the algorithms shown in Figure 2 that use change notification hints send them only after the updating transaction has committed. 4.2.3 Remote Update Action. The final level of differentiation in the detection-based half of the taxonomy is concerned with the action taken when a notification arrives at a remote site. There are three options here: propagation, invalidation, and choosing dynamically between the two. Propagation results in the newly updated value being installed at the remote site in place of the stale copy. Invalidation, on the other hand, simply removes the stale copy from the remote cache so that it will not be accessed by any subsequent transactions. After a page copy is invalidated ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 332 • M. J. Franklin et al. at a site, any subsequent transaction that wishes to access the page at that site must obtain a new copy from the server. A dynamic algorithm can chose between invalidation and propagation heuristically in order to optimize performance for varying workloads. 4.3 Avoidance-Based Algorithms Avoidance-based algorithms form the other half of our taxonomy. The avoidance-based side of the taxonomy is shown in Figure 3. As stated previously, avoidance-based algorithms enforce consistency by making it impossible for transactions to ever access stale data in their local cache. They accomplish this by directly manipulating the contents of remote client caches as the result of (or prior to) client updates. Because consistency actions manipulate page copies in remote client caches, the client software must include additional mechanisms to support these actions (e.g., some schemes require that clients have a full function lock manager). In addition to their need for additional client support, avoidance-based algorithms also require extra information to be maintained at the server. Specifically, all of the avoidance-based algorithms described here require that the server keep track of the location of all page copies. In order to satisfy the “write all” requirement of the ROWA paradigm, it must be possible to locate all copies of a given page. One way to accomplish this is through the use of broadcast, as in the snooping protocols used in caching algorithms for small-scale multiprocessors [Goodman 1983]. Reliance on broadcast is not a viable option in a page server DBMS environment, however, due to cost and scalability issues. As a result, a “directory-based” approach [Agarawal et al. 1988] must be used. As discussed in Section 2, the server is the focal point for all transaction management functions and is responsible for providing clients with requested data; as a result, the avoidance-based algorithms covered here all maintain a directory of client page copies at the server. There are four levels in the avoidance-based half of the taxonomy, as shown in Figure 3: write intention declaration, write permission duration, remote conflict priority, and remote update action. Two of these dimensions, write intention declaration and remote update action, are analogous to dimensions that appeared in the detection-based side of the taxonomy. 4.3.1 Write Intention Declaration. As with the detection-based algorithms, the avoidance-based algorithms can be categorized according to the time at which transactions initiate consistency actions. The nature of their consistency actions, however, is somewhat different than in the detectionbased schemes. Because of the ROWA protocol, transactions executing under an avoidance-based scheme can always read any page copy that is cached at their local client. Thus, interaction with the server is required only for access to pages that are not cached locally and for updates to cached pages. Interactions with the server to obtain copies of noncached pages must, of course, be done synchronously. On a cache miss, the client requests the desired page from the server. When the server responds with a ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency • 333 copy of the page, it also implicitly gives the client a guarantee that the client will be informed if another client performs an operation that would cause the copy to become invalid. While all of the avoidance-based algorithms use the same policy for handling page reads, they differ in the manner in which consistency actions for updates are initiated. When a transaction wishes to update a cached page copy, the server must be informed of this write intention sometime prior to transaction commit so that it can implement the ROWA protocol. When the server grants write permission on a page to a client, it guarantees that, for the duration of the permission, the client can update that page without again having to ask the server.7 The duration of write permissions is addressed in Section 4.3.2. A write permission fault is said to occur when a transaction attempts to update a page copy for which it does not possess write permission. The taxonomy contains three options for when clients must declare their intention to write a page to the server: —Synchronous, on a write permission fault. —Asynchronous, initiated on a write permission fault. —Deferred, until the updating transaction enters its commit processing phase. In the first two options, clients contact the server at the time that they first decide to update a page for which they do not currently possess write permission. As in the detection-based case, such requests can be performed synchronously or asynchronously. In the third option, declarations of write intentions are deferred until the transaction finishes its execution phase (if the updated data can be held in the cache until then). The tradeoffs among synchrony, asynchrony and deferral for write intentions are similar in spirit to those previously discussed for the detectionbased algorithms: synchronous algorithms are pessimistic, deferred ones are optimistic, and asynchronous ones are a compromise between the two. The magnitude of these trade-offs, however, are quite different for avoidance-based algorithms. The global (ROWA) nature of these algorithms implies that consistency actions may be required at one or more remote clients before the server can register a write permission for a given client (or transaction). Therefore, consistency actions can involve substantial work. Furthermore, in avoidance-based algorithms the remote consistency operations are in the critical path of transaction commit; an update transaction cannot commit until all of the necessary consistency operations have been successfully completed at remote clients. These considerations tend to strengthen the case for deferral of consistency actions for avoid- 7 A “permission”, while similar to a “write lock”, differs in two significant ways. First, permissions are granted to client sites rather than to individual client transactions. Second, permissions are not subject to a two-phase constraint (i.e., they can be released and reacquired). ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 334 • M. J. Franklin et al. ance-based algorithms. Of course, the cost of such deferral is a potential increase in the number of aborted transactions. 4.3.2 Write Permission Duration. In addition to when write intentions are declared, avoidance-based algorithms can also be differentiated according to how long write permission is retained for. There are two choices at this level of the taxonomy: write permissions can be retained only for the duration of a particular transaction, or they can span multiple transactions at a given client. In the first case, transactions start with no write permissions, so they must eventually declare write intentions for all pages that they wish to update; at the end of the transaction, all write permissions are automatically revoked by the server. In the second case, a write permission can be retained at a client site until the client chooses to drop the permission or until the server asks a client to drop its write permission (in conjunction with the performance of a consistency action). 4.3.3 Remote Conflict Priority. The third level of differentiation for avoidance-based algorithms is the priority given to consistency actions when they are received at remote clients. There are two options here: wait and preempt. A wait policy states that consistency actions that conflict with the operation of an ongoing transaction at a client must wait for that transaction to complete. In contrast, under a preempt policy, ongoing transactions can be aborted as the result of an incoming consistency action. Under the wait policy, the guarantees that are made to clients regarding the ability to read cached page copies are somewhat stronger than under the preempt policy. This is because the wait policy forces a remote writer to serialize behind a local reader if a conflict arises, whereas writers always have priority over readers under the preempt policy, so conflicting readers are aborted. 4.3.4 Remote Update Action. The final level on the avoidance-based side of the taxonomy is based on how remote updates are implemented. The options here are the same as in the detection-based case, namely: invalidation, propagation, and choosing dynamically between the two. As stated previously, the propagation of updates to remotely cached copies can be problematic if consistency actions are sent to remote sites during a transaction’s execution phase. As a result, all of the published algorithms in the taxonomy that send remote consistency actions during the execution phase rely on invalidation as the mechanism for handling updates remotely. An important difference between remote update actions under the avoidance-based algorithms and under the detection-based ones (discussed earlier) is that in the avoidance-based case, the remote operations are initiated and must be completed on behalf of a transaction before the transaction is allowed to commit. This is necessary to maintain the ROWA semantic guarantees that provide the basis for the correctness of avoidance-based algorithms. Therefore, if update propagation is used, all remote sites that receive the propagated update must participate in a two-phase commit with the server and the client at which the transaction is executing. In contrast, ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency • 335 invalidation does not require two-phase commit, as data is simply removed from the remote client caches in this case. 5. A PERFORMANCE COMPARISON OF THREE ALGORITHM FAMILIES The taxonomy presented in the previous section illuminates the wide range of options that have been explored by designers of transactional cache consistency maintenance algorithms. Recall that because the algorithms that have been proposed all provide the same functionality (i.e., they support one-copy serializability in the presence of dynamic caching), performance issues are a primary consideration in choosing among them. In this section we examine the performance implications of a number of the choices identified in the taxonomy. Our own work has focused primarily on algorithms from the avoidancebased half of the taxonomy [Carey et al. 1991; Franklin and Carey 1992]. In this section, we consolidate the results of those studies and reexamine their conclusions in the context of the design choices identified in the taxonomy. We first describe six candidate algorithms from three different algorithm families. We then provide an overview of a detailed simulation model and a set of four workloads used to examine the relative performance of those algorithms. Finally, performance results from a series of simulations are analyzed to shed light on the relevant design decisions from the taxonomy. The insights gained through this process are then used in Section 6 to reflect on the performance characteristics of the remaining design choices and algorithms in the taxonomy. 5.1 Algorithms The algorithms that we have focused on come from three families: Serverbased two-phase locking (S2PL), Callback Locking (CBL), and Optimistic 2PL (O2PL). The study of these three families was initially undertaken to develop alternatives for the SHORE object manager. We focused on these three families for the following reasons: S2PL algorithms have been used in a number of early systems including ORION and EXODUS. CBL is a logical extension of two-phase locking and has been used in the ObjectStore OODBMS; a variant of CBL was shown to have good performance in Wang and Rowe [1991]. Finally, the O2PL family is based on an algorithm that performed well in an earlier study of distributed database systems [Carey and Livny 1991]. While the algorithms in these three families differ in many ways, they all stem from the fundamental observation that because cached data is dynamically replicated data, techniques originally devised for managing replicated data can be adapted to manage cached copies. In the following, we briefly describe each of these three algorithm families (see Franklin [1993] for a more detailed description) and then identify pairs of algorithms that can be used to isolate the impact of a number of the design choices described in Section 4. 5.1.1 Server-Based Two-Phase Locking (S2PL). Server-based twophase locking algorithms are detection-based algorithms that validate ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 336 • M. J. Franklin et al. cached pages synchronously on a transaction’s initial access to the page. Server-based 2PL schemes are derived from the primary copy approach to replicated data management [Alsberg and Day 1976; Stonebraker 1979]. Before a transaction is allowed to commit, it must first access a specially designated copy (i.e., the primary copy) of each data item that it reads or writes. In a page server DBMS (with no server replication), the primary copy of any page is the one that resides at the server. For reads, the client’s copy of the page must be verified to have the same value as the server’s copy. For writes, the new value created by the transaction must be installed as the new value of the primary copy. The performance results examined here include an algorithm called Caching 2PL (C2PL). In C2PL, cache consistency is maintained using a “check-on-access” policy. All page copies are tagged with a version number that uniquely identifies the state of the page.8 When a transaction attempts a page access for which it has not obtained the proper lock (i.e., read or write), it sends a lock request to the server and waits for the server’s response. If the page is cache-resident at the client, then the cached copy’s version number is included in the lock request message. If any transactions hold conflicting locks, then the lock request blocks at the server until those locks are released. When the server grants a read lock to a client, it also determines whether or not the client has an up-to-date cached copy of the requested page. If not, then the server piggybacks a valid copy of the page on the lock response message returned to the client. C2PL uses strict two-phase locking—all locks are held until transaction commit or abort. Deadlocks are detected through a centralized scheme at the server, and are resolved by aborting the youngest transaction involved in the deadlock. C2PL is one of the simplest algorithms that supports intertransaction caching, and therefore, algorithms similar to C2PL have been implemented in several systems, including the ORION-1SX prototype [Kim et al. 1990] and the EXODUS storage manager [Exodus Project Group 1993]. An algorithm similar to C2PL has also been studied in Wang and Rowe [1991]. For comparison purposes, the performance study also includes results for an algorithm called Basic 2PL (B2PL) that allows only intratransaction caching. B2PL works similarly to C2PL, except that under B2PL, the client’s buffer pool is purged upon transaction termination. Since every transaction starts with an empty buffer pool, no page copies ever need to be validated with the server. Comparing the performance of B2PL to that of C2PL (and the other algorithms) isolates the degree of performance improvement that is due to intertransaction caching. 5.1.2 Callback Locking (CBL). Callback Locking algorithms are similar to C2PL, in that they are extensions of two-phase locking that support intertransaction page caching. In contrast to the detection-based C2PL algorithm, however, Callback Locking algorithms are avoidance-based. 8 Data pages are typically tagged with such numbers, called Log Sequence Numbers (LSNs), in systems that use the Write-Ahead-Logging protocol for crash recovery [Gray and Reuter 1993]. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency • 337 Therefore, locally cached page copies are always guaranteed to be valid, so transactions can read them without contacting the server (i.e., only a local read lock is required). On a cache miss, the client sends a page request message to the server. The server returns a valid copy of the requested page when it determines that no other active clients believe they have write permission for the page. In Callback Locking, write intentions are declared synchronously—a client must have write permission on a page before it can grant a local write lock to a transaction. Because write permissions are obtained during transaction execution, transactions can commit after completing their operations without performing any additional consistency maintenance actions. We have studied two Callback Locking variants: Callback-Read (CB-R), where write permissions are granted only for the duration of a single transaction (i.e., they are treated like traditional write locks), and Callback-All (CB-A), where write permissions are retained at clients until being called back or until the corresponding page is dropped from the cache. Both variants use invalidation for handling remote updates. With Callback Locking (as with all avoidance-based algorithms), the server keeps track of the locations of cached copies throughout the system. Clients inform the server when they drop a page from their buffer pool by piggybacking that information on the next message that they send to the server. The server’s copy information is thus conservative—there may be some delay before the server learns that a page is no longer cached at a client. Transactions obtain locks from the local lock manager at the client site on which they execute. Read lock requests, as well as requests for write locks on pages for which the client has obtained write permission, can be granted immediately without contacting the server. Write lock requests on pages for which write permission has not yet been obtained cause a “write permission fault”. On a write permission fault, the client must register its write intention with the server and then wait until the server responds that the permission has been granted before continuing. When a write intention declaration arrives at the server, the server issues callback requests to all sites (except the requester) that hold a cached copy of the requested page. At a client, such a callback request is treated as a request for a write lock on the specified page. If the request cannot be granted immediately, due to a lock conflict with an active transaction, the client responds to the server by saying that the page is currently in use. When the callback request is eventually granted at the client, the page is removed from the client’s buffer and an acknowledgment message is sent to the server. When all callbacks have been acknowledged to the server, the server registers the write permission on the page for the requesting client and informs the client that it has done so. Any subsequent read or write requests for the page by transactions from other clients will then be blocked at the server until the write permission is released by the holding client or else revoked by the server. If a read request for a page arrives at the server and a write permission for the page is currently registered for some other client, then the server ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 338 • M. J. Franklin et al. action is algorithm-dependent. Under Callback-Read (CB-R), where Write Permission Duration is only until the end of a transaction, the read request is simply blocked at the server until the termination of the current transaction at the client holding the permission. In contrast, under Callback-All (CB-A), the server sends a downgrade request to that client. A downgrade request is similar to a callback request, but rather than responding by removing the page from its buffer, the client simply acknowledges to the server that it no longer has write permission on the page. At a remote client, a downgrade request for a page copy must first obtain a read lock on the page in order to ensure that no transactions active at the client are currently holding write locks on the page. The downgrade request blocks at the client if a conflict is detected, in which case a message is sent to the server informing it of the conflict. Global deadlocks can arise due to callback and downgrade requests. These deadlocks can always be detected at the server, however, because clients inform the server when they block such requests. As in the S2PL algorithms, deadlocks are resolved by aborting the youngest involved transaction. At the end of a transaction, the client sends copies of any cached updated pages to the server. This is done only to simplify recovery, as no other sites can access a page while it is cached with write permission at a site. Thus, it is technically possible to avoid sending a copy of a dirty page back to the server until the write permission on the page is downgraded or the page is dropped [Franklin et al. 1993]. Callback-style algorithms originated in the operating systems community for maintaining cache consistency in distributed file systems such as Andrew [Howard et al. 1988] and Sprite [Nelson et al. 1988], both of which provide weaker forms of consistency than that required by database systems. More recently, a Callback Locking algorithm that provides transaction serializability has been employed in the ObjectStore OODBMS [Lamb et al. 1991]. An algorithm similar to CB-R was also studied in Wang and Rowe [1991]. 5.1.3 Optimistic Two-Phase Locking (O2PL). The third family of caching algorithms that we have studied is Optimistic Two-phase Locking (O2PL). The O2PL algorithms are derived from a concurrency control protocol that was originally developed for replicated distributed databases [Carey and Livny 1991]. The O2PL algorithms are avoidance-based, but they are more “optimistic” than Callback Locking because they defer write intention declaration until the end of a transaction’s execution phase. We have developed and analyzed several O2PL variants that differ in their implementation of remote update actions. In this article we focus on two such variants: O2PL-Invalidate (O2PL-I), which always invalidates remote copies, and O2PL-Propagate (O2PL-P), which always propagates updated page copies to remote clients that are caching the updated pages. Under O2PL, each client has a local lock manager from which the proper lock must be obtained before a transaction can access a data item at that client. No locks are obtained at the server during the execution phase of a ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency • 339 transaction.9 Transactions update pages in their local cache, and these updated pages are retained at the client (unless they are aged out) until the transaction enters its commit phase. When an updating transaction is ready to enter its commit phase, it sends a message to the server containing the new copies of such pages. The server then acquires exclusive locks on these pages on behalf of the finishing transaction. The locks obtained at the server are held until the transaction completes, as they will allow the server to safely install the new page values. Once the required locks have been obtained at the server, the server sends a message to each client that has cached copies of any of the updated pages. These remote clients obtain exclusive locks on their local copies (if present) of the updated pages on behalf of the committing transaction. If any of their transactions currently holds a read lock on a local copy, then the update transaction will have to wait for the reader transaction(s) to complete before it can continue commit processing. Once all of the required locks have been obtained at a remote site, that site performs consistency actions on its copies of the updated pages: Under O2PL-I, the client simply purges its copies of the updated pages, releases its local locks on those pages, and then sends an acknowledgment message to the server—a two-phase commit protocol is not necessary in this case. In contrast, under O2PL-P, remote clients must enter a two-phase commit protocol with the server in order to sure that the updates to all remote copies happen atomically. First, each client sends a message to the server indicating that it has obtained the necessary local locks. This message acts as the “prepared” message of the commit protocol. When the server has heard from all involved clients, it sends copies of the updated pages to those sites. These messages initiate the second phase of the commit protocol. Upon receipt of the new page copies, the clients install them in their buffer pools and then release the locks on those pages.10 Because O2PL is distributed and locking-based, distributed deadlocks can arise in O2PL-I and O2PL-P. Each client therefore maintains a local waits-for graph which is used to detect deadlocks that are local to that client. Global deadlocks are detected using a centralized algorithm in which the server periodically requests local waits-for graphs from the clients and combines them to build a global waits-for graph.11 As in the previously described algorithms, deadlocks are resolved by aborting the youngest transaction involved. 5.1.4 Evaluating the Tradeoffs. The three families of cache consistency maintenance algorithms described in the preceding sections cover a num9 Actually, a non-two-phase read lock (i.e., latch) is obtained briefly at the server when a data item is in the process of being prepared for shipment to a client to ensure that the client is a given a transaction-consistent copy of the item. 10 It should be noted that the receipt of a propagated page copy at a client does not affect the page’s LRU status at that site. 11 Note that deadlocks involving consistency actions can be resolved early, rather than waiting for periodic detection, as any conflict detected between two consistency actions or between a consistency action and an update will ultimately result in a deadlock [Carey and Livny 1991]. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 340 • M. J. Franklin et al. Table I. Design Choices and Relevant Comparisons Design Choice Invalid Access Prevention Write Intention Declaration Write Permission Duration Remote Update Action Algorithms to Compare C2PL (Detection) vs. CB-A (Avoidance) CB-R (Synchronous) vs. O2PL-I (Deferred) CB-R (Single Transaction) vs. CB-A (Until Revoked or Dropped) O2PL-I (Invalidation) vs. O2PL-P (Propagation) ber of the design alternatives identified in the taxonomy presented in Section 4. As stated previously, the focus of our work has been on avoidance-based algorithms, so the majority of the tradeoffs investigated come from that side of the taxonomy. However, because several of the avoidancebased dimensions have analogs on the detection-based side of the taxonomy, the comparisons presented here can shed light on a number of the detection-based tradeoffs as well. The algorithms that are not directly addressed in this section will be discussed in Section 6. Table I summarizes the portion of the design space covered in this study and shows which algorithms can be compared in order to examine the performance tradeoffs implied by a given decision point in the taxonomy. Invalid Access Prevention (C2PL vs. CB-A). As described in Section 4, the top-level design choice in the taxonomy is the policy used for preventing access to invalid data. Detection requires the validity of all accessed data to be explicitly confirmed prior to transaction commit, while Avoidance ensures that transactions never have the opportunity to access stale data. The S2PL algorithms are detection-based, whereas the CBL and O2PL algorithms are all avoidance-based. Among these algorithms, comparing the performance of C2PL and the CBL algorithms can provide the clearest insights into this trade-off. These algorithms all allow intertransaction caching, do not propagate updated pages, and initiate their consistency actions synchronously. Of the two CBL algorithms, CB-A provides the strongest contrast with C2PL because it retains both read and write permissions. Comparing C2PL with CB-R is also useful because CB-R is avoidance-based but requires obtaining write permissions from the server in the same manner as C2PL. Write Intention Declaration (CB-R vs. O2PL-I). For avoidance-based algorithms, the next level of differentiation is concerned with the timing of Write Intention Declarations. As described in Section 4.3.1, avoidancebased algorithms can be pessimistic and require update transactions to declare their write intentions synchronously when a permission fault occurs, or they can be more optimistic and allow the deferral of these declarations until the update transaction enters its commit phase. The CBL algorithms belong to the pessimistic or synchronous camp, while the O2PL ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency Fig. 4. • 341 Performance model of a client-server DBMS. algorithms are more optimistic. Comparing the performance of CB-R and O2PL-I provides insight into this tradeoff, as both algorithms retain write permissions only until the end of transaction and both use invalidation for remote update actions. Write Permission Duration (CB-R vs. CB-A). The next level of choice for avoidance-based algorithms is that of Write Permission Duration. As discussed in Section 4.3.1, write permissions can be associated with a single transaction, or they can be retained by a client site across multiple transactions. The tradeoffs between these two approaches can be directly observed by comparing the performance of CB-R and CB-A, which differ only in this aspect. Remote Update Actions (O2PL-I vs. O2PL-P). The final choice to be examined here is that of the action performed on remote copies of updated pages. As stated in Section 4.3.4, two options are invalidation, which purges such copies from remote caches, and propagation, which sends new valid copies of such pages to the remote sites that contain cached copies of them. Comparing the performance of O2PL-I and O2PL-P, which differ only in this respect, will help to isolate the tradeoffs between these two options. 5.2 A Client-Server Performance Model 5.2.1 The System Model. Figure 4 shows the structure of our simulation model, which was constructed using the DeNet discrete event simulation language [Livny 1990]. It consists of components that model diskless client workstations and a server machine (with disks) that are connected over a simple network. Each client site consists of a Buffer Manager that uses an LRU page replacement policy, a Concurrency Control Manager that is used either as a simple lock cache or as a full-function lock manager (depending on the cache consistency algorithm in use), a Resource Manager that provides CPU service and access to the network, and a Client Manager that coordinates the execution of transactions at the client. Each client also has a Transaction Source which initiates transactions one-at-a-time at the client site according to the workload model described in the next subsection. Upon completion of one transaction, the source waits for a specified think time and then submits the next transaction. If a transaction aborts, ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 342 • M. J. Franklin et al. Table II. Parameter PageSize DatabaseSize NumClients ClientCPU ServerCPU ClientBufSize ServerBufSize ServerDisks MinDiskTime MaxDiskTime NetBandwidth FixedMsgInst PerByteMsgInst ControlMsgSize LockInst RegisterCopyInst DiskOverheadInst DeadlockInterval System and Overhead Parameter Settings Meaning Setting Size of a page Size of database in pages Number of client workstations Instruction rate of client CPU Instruction rate of server CPU Per-client buffer size Server buffer size Number of disks at server Minimum disk access time Maximum disk access time Network bandwidth Fixed no. of inst. per message No. of addl. inst. per msg. byte Size in bytes of a control message Instructions per lock/unlock pair Inst. to register/unregister a copy CPU Overhead to perform I/O Global deadlock detection frequency 4,096 bytes 1250 1 to 25 15 MIPS 30 MIPS 5% or 25% of DB 50% of DB size 2 disks 10 millisecond 30 milliseconds 8 or 80 Mbits/sec 20,000 instructions 10,000 inst. per 4Kb 256 bytes 300 instructions 300 instructions 5000 instructions 1 second (for O2PL) it is resubmitted with the same page reference string. The number of client machines is a parameter to the model. The server machine is modeled similarly to the clients, but with the following differences: First, the server’s Resource Manager manages disks as well as a CPU. Second, its Concurrency Control Manager has the ability to store information about the location of page copies in the system and also manages locks. Third, there is a Server Manager component that coordinates the server’s operation; this is analogous to the client’s Client Manager. Finally, there is no Transaction Source module at the server since all transactions originate at client workstations. Table II describes the parameters that are used to specify the system resources and overheads and shows the settings used in this study. We used a relatively small database in order to make simulations involving fractionally large buffer pools feasible in terms of simulation time. The most important factor here is the ratio of the transaction and client-server buffer pool sizes to the database size, not the absolute database size itself.12 The simulated CPUs of the system are managed using a two-level priority scheme. System CPU requests, such as those for message and disk handling, are given priority over user (client transaction) requests. System CPU requests are handled using a FIFO queuing discipline, while a processor-sharing discipline is employed for user requests. Each disk has a FIFO queue of requests; the disk used to service a particular request is chosen uniformly from among all the disks at the server. The disk access 12 Results demonstrating the scalability of the simulator when the database size and buffers sizes are increased by an order of magnitude and the transaction length is increased accordingly are described in Carey et al. [1994]. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency Table III. Parameter TransSize HotBounds ColdBounds HotAccProb ColdAccProb HotWrtProb ColdWrtProb PerPageInst ThinkTime • 343 Workload Parameter Settings for Client n PRIVATE HOTCOLD UNIFORM FEED 16 pages p to p 1 24 p 5 25 (n 2 1) 1 1 626 to 1,250 0.8 0.2 0.2 0.0 30,000 0 20 pages p to p 1 49, p 5 50 (n 2 1) 1 1 rest of DB 0.8 0.2 0.2 0.2 30,000 0 20 pages — 5 pages 1 to 50 all of DB — 1.0 — 0.2 30,000 0 rest of DB 0.8 0.2 1.0/0.0 0.0/0.0 30,000 0 time is drawn from a uniform distribution between a specified minimum and maximum. A very simple network model is used in the simulator’s Network Manager component; the network is modeled as a FIFO server with a specified bandwidth. We did not model the details of the operation of a specific type of network (e.g., Ethernet, token ring, etc.). Rather, the approach we took was to separate the CPU costs of messages from their on-the-wire costs, and to allow the on-the-wire message costs to be adjusted using the network bandwidth parameter. The CPU cost for managing the protocol to send or receive a message is modeled as a fixed number of instructions per message plus an additional charge per message byte. 5.2.2 Client Workloads. Our simulation model provides a simple but flexible mechanism for describing client workloads. The access pattern for each client can be specified separately using the parameters shown in the first column of Table III. Transactions are represented as a string of page access requests in which some accesses are for reads and others are for writes. Two ranges of database pages can be specified: hot and cold. The probability of a page access being to a hot range page is specified; the remainder of the accesses are directed to cold range pages. For both ranges, the probability that an access to a page in the range will involve a write (in addition to a read) is specified. The parameters also allow the specification of the average number of instructions to be performed at the client for each page read or write, once the proper lock has been obtained. Table III summarizes the workloads that are examined here. The PRIVATE workload has a per-client private hot region that is read and written by each client and a shared cold region that is accessed in a read-only manner by all clients. This workload is intended to model an environment such as a large CAD system, where each user has a portion of the design on which they work while accessing additional design parts from a shared library of components. The HOTCOLD workload has a high degree of locality per client and a moderate amount of sharing and data contention among clients. UNIFORM is a low-locality, moderate-write probability workload used to examine the consistency algorithms in a case where caching is not expected to pay off significantly. This workload has a higher ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 344 • M. J. Franklin et al. level of data contention than HOTCOLD. Finally, the FEED workload represents an application involving a highly directional information flow, such as one might expect in a stock quotation system; one site produces data while all the other sites consume it. 5.3 Experiments and Results In this section we present results from performance experiments involving the algorithms described in Section 5.1. The main performance metric presented is system throughput (measured in transactions per second).13 The throughput results are, of course, dependent on the particular settings chosen for the various physical system resource parameters. For example, the relative performance of the algorithms in a disk-bound system may differ greatly from that in a CPU-bound system. Thus, while the throughput results show performance characteristics in what we consider to be a reasonable environment, we also present various auxiliary performance measures, such as message and disk I/O counts, to provide additional insights into the fundamental trade-offs among the algorithms. Auxiliary metrics that are presented as “per commit” values are computed by taking the total count for the given metric (e.g., the total number of messages routed through the network) over the duration of the simulation run and then dividing by the number of transactions that committed during that run. As a result, these averages also take into account work that was done on behalf of aborted transactions. To ensure the statistical validity of the results presented here, we verified that the 90% confidence intervals for transaction response times (computed using batch means) were sufficiently tight. The size of the confidence intervals was within a few percent of the mean in all cases, which is more than sufficient for our purposes. In the sections that follow, we focus on a system configuration in which each client has a large cache (25% of the active database size) and the network bandwidth is set to the lower value in Table II (8 Mbits/sec). The network speed was chosen to approximate the speed of an Ethernet, reduced slightly to account for bandwidth lost to collisions, etc. The large client cache size tends to reduce the performance impact of server I/O. Therefore, the combination of these settings tends to emphasize the performance impact of message behavior, which plays a role in all four of the tradeoffs listed in Table I. However, I/O and transaction aborts also factor into the comparisons and will be discussed where appropriate. Finally, although this section focuses on results from a limited set of experiments, it should be emphasized that we have run numerous experiments with a variety of different parameter settings and workloads. Many of these experiments are described in Carey et al. [1991], Franklin and Carey [1992], and Franklin [1993]. 13 Because the simulation uses a closed queuing model, throughout and response time are equivalently informative metrics. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency Fig. 5. • 345 Throughput (Private, 25% client cache, slow net). 5.3.1 The PRIVATE Workload. We first examine performance results for the PRIVATE workload. As described in Section 5.2.2, PRIVATE has a high degree of locality per client, and it has no read-write or write-write data sharing. Figure 5 shows the total system throughput for the PRIVATE workload as the number of clients in the system is increased from 1 to 25. In this experiment, the invalidation-based O2PL algorithms and Callback locking algorithms provide the best performance. The detection-based C2PL algorithm has lower throughput, and B2PL, which does not allow intertransaction caching, has the poorest performance by a significant margin. In this experiment (and in most of the others), B2PL suffers due to high message volumes and server disk I/O because it is unable to exploit client memory for storing data pages across transaction boundaries. In order to see what insight these results can provide into the performance trade-offs for this workload, it is helpful to examine pairs of algorithms as discussed in Section 5.1.4. The first trade-off we examine here is based on the choice of Invalid Access Prevention. In this experiment, the avoidance-based algorithms all significantly outperform the detection-based C2PL algorithm throughout the range of client populations. This behavior is due to the server CPU overhead and the path-length resulting from the number of messages sent per transaction. Focusing on C2PL and CB-A, as can be seen in Figure 6, C2PL requires nearly 40 messages per transaction (on average) in this experiment, while CB-A requires only 12. This difference is because the pessimistic, detection-based ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 346 • M. J. Franklin et al. Fig. 6. Messages sent/commit (Private, 25% client cache, slow net). C2PL algorithm sends a message to the server on every initial page access— even for pages that it has cached locally. In fact, C2PL sends the same number of messages as the noncaching B2PL algorithm, though it sends many fewer bytes because fewer of the replies from the server contain page copies than for B2PL. This difference in message requirements in the absence of data contention is one reason why most of the detection-based algorithms that have been proposed (see Figure 2) include some amount of optimism. The next design decision to be examined is the choice of Write Intention Declaration timing. Because of the lack of read-write and write-write sharing in this workload, however, this choice has only a minor impact on performance here. As can be seen in Figure 5, CB-R performs only slightly below O2PL-I under this workload. With no data conflicts, write intention declarations only require a round trip message to the server as no remote clients ever need to be contacted, so O2PL-I gains only a small savings in messages by deferring its write intention declarations until commit time. Returning to the message counts shown in Figure 6, it can be seen that while the message requirements for CB-R remain constant as clients are added, there is a slight rise in the message requirements for O2PL-I. This rise is due to the cost of distributed deadlock detection, which is not required by CB-R. Finally, it should be noted that the absence of data conflicts means that the differences in abort rates between pessimism (CB-R) and optimism (O2PL-I) are simply not an issue for this workload. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency Fig. 7. • 347 Throughput (HOTCOLD, 25% client cache, slow net). The trade-offs for the choice of Write Permission Duration can be seen by comparing the performance and message sending behavior of CB-A, which retains permissions across transaction boundaries, and CB-R, which gives up a write permission when the transaction that obtained it completes. Under the PRIVATE workload, CB-A declares a write intention on a page copy at most once for the duration of the page copy’s residence in the cache, as permissions are never called back by remote clients under this workload. Thus, CB-A consistently sends fewer messages than CB-R. This results in a message savings and a slight throughput advantage for CB-A in this case (in fact, CB-A performs as well as O2PL-I does here). Finally, it should be noted that the choice of Remote Update Action does not impact performance under the PRIVATE workload. This is again due to the absence of read-write and write-write sharing. No remote updates ever occur, so O2PL-I and O2PL-P provide similar throughput here. 5.3.2 The HOTCOLD Workload. Figure 7 shows the throughput results for the HOTCOLD workload with the large client caches and slow network. As described in Section 5.2.2, HOTCOLD has high locality per client, but unlike the PRIVATE workload, it also has read-write and write-write sharing among the clients. Despite this sharing, however, the relative throughput for each of the algorithms (except for O2PL-P, which is discussed below) is similar to what was observed in the PRIVATE case. That is, the avoidance-based algorithms perform better than C2PL, and the noncaching B2PL algorithm has the worst performance. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 348 • M. J. Franklin et al. Fig. 8. Messages sent/commit (HOTCOLD, 25% client cache, slow net). The introduction of read-write and write-write sharing raises several additional tradeoffs for cache consistency maintenance. Many of the tradeoffs can be seen in Figure 8, which shows the number of messages sent per committed transaction. The first trade-off that we discuss is that between detection-based and avoidance-based Invalid Access Prevention. As can be seen in the figure, the message counts for the detection-based C2PL algorithm are independent of the number of clients in this case, while the avoidance-based algorithms all send more messages per transaction as clients are added (unlike under the PRIVATE workload results of Figure 6). The additional messages used by the avoidance-based algorithms are for implementing remote update actions (callbacks, invalidations, propagations, etc.) at clients that possess cached copies of affected pages. As more clients are added, the number of cached copies for any given page increases, so more messages for remote update actions are required. However, it should be noted that the number of messages sent by the avoidance-based CB-A algorithm remains substantially lower than for the detection-based C2PL algorithm throughout the range of client populations explored in this experiment. The next tradeoff of interest involves Write Intention Declaration. The tradeoffs between declaring Write Intentions synchronously, during transaction execution (as in CB-R), or deferring such declarations until transaction commit (as in O2PL-I) are slightly different under HOTCOLD than they were under the PRIVATE workload. Comparing the number of mesACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency • 349 sages sent under HOTCOLD (Figure 8) and under PRIVATE (Figure 6), the difference between CB-R and O2PL-I is greater here than under PRIVATE for two reasons: first, each transaction updates more pages under HOTCOLD than under PRIVATE, and second, some intention declarations result in Remote Update Actions here. Since CB-R declares intentions one-at-a-time, multiple declaration messages are sent, and it is possible that multiple callback requests will be sent to some remote clients during a transaction. In contrast, by deferring Write Intention Declarations, O2PL-I sends only a single declaration message to the server, which in turns sends at most one request for Remote Update Actions to each remote client. This difference has only a small impact on throughput here, and that impact disappears as more clients are added and the server disks become the dominant resource. Finally, while the Write Intention Declaration decision also impacts the transaction abort rate (as discussed in Section 5.1.4), the abort rate does not play a significant factor in this experiment due to a fairly low level of data contention. The trade-offs involving Write Permission Duration are affected in an interesting way by the introduction of read-write and write-write sharing, as can be seen by comparing the message behavior of CB-R and CB-A in Figure 8. With fewer clients, CB-R, which gives up write permissions at the end of a transaction, sends more messages than CB-A, which retains permissions across transaction boundaries. However, as clients are added, the amount of sharing increases; more write permission callbacks occur, so the number of messages sent by CB-A increases at a faster rate than for CB-R. CB-A has higher message requirements than CB-R at 15 clients and beyond. Due to the fact that the disk becomes the dominant resource in this region, however, the two Callback algorithms deliver similar performance. Finally, the choice of Remote Update Action has a very significant impact in this experiment, due to presence of read-write and write-write sharing. In contrast to the invalidation-based O2PL-I algorithm, O2PL-P suffers a substantial degradation in performance beyond five clients; it eventually performs even below the level of C2PL. The reason for O2PL-P’s poor performance in this case is a dramatic increase in message volume as clients are added. At 25 clients, O2PL-P sends almost three times more data through the network (about 120 Kbytes per commit) than O2PL-I (which sends about 43 Kbytes per commit). This increase is due to the messages needed by O2PL-P to propagate updated pages to remote clients. At 25 clients, it sends propagations to an average of 13 remote clients per transaction. Furthermore, the vast majority of these propagations are “wasted”—that is, the remote copies are either propagated to again, or else dropped from the cache, before the previous propagated value is ever actually used. It should be noted that the large number of involved sites is due to the large client caches; when O2PL-P is used with smaller client caches, wasted propagations are reduced, as unimportant pages tend to be quickly pushed out of the caches before another propagation occurs. This experiment demonstrates that using propagation to implement Remote Update Actions is a rather dangerous policy. Its performance is very ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 350 • M. J. Franklin et al. Fig. 9. Throughput (UNIFORM, 25% client cache, slow net). sensitive to the size of the client caches, and it can more than nullify the performance benefits of caching in some cases. 5.3.3 The UNIFORM Workload. The third workload that we examine is the UNIFORM workload, which has no per-client locality and as a result has a higher level of data contention and benefits less from caching than the HOTCOLD workload. The throughput results and message counts are shown in Figures 9 and 10, respectively. In terms of throughput, UNIFORM’s combination of no locality and high data contention reduces the magnitude of the performance differences among the caching algorithms. In terms of message counts, Figure 10 shows that the tradeoffs for Invalid Access Prevention are somewhat different here than in the previous cases. As in the HOTCOLD workload, the number of messages required by the avoidance-based algorithms increases with the number of clients, whereas the requirements of C2PL remain nearly constant (increasing slightly due to aborted transactions). Unlike the HOTCOLD case, however, all of the avoidance-based algorithms require more messages than the detectionbased C2PL algorithm beyond 5–10 clients. To understand why detection leads to fewer messages than avoidance in this low-locality situation, it is useful to examine the message tradeoffs made by the avoidance-based algorithms. Under CB-A (as well as the other avoidance-based algorithms), the permission to read a page is effectively ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency Fig. 10. • 351 Messages sent/commit (UNIFORM, 25% client cache, slow net). cached along with the page. Thus, when a client wishes to read a page that it has in its cache, it can do so without contacting the server. Compared to C2PL, this saves two messages per initial read. However, if some remote client wishes to write a page that a client has cached, then a callback message will arrive and a reply must be sent. This is a net cost of two messages per write compared to C2PL. Furthermore, CB-A has analogous tradeoffs for pages on which write permissions are retained: it enjoys a savings of two messages if the page is written locally, and pays a price of two messages if the page is read remotely. CB-A’s avoidance approach therefore yields a net loss if a page is less likely to be read locally than written remotely, and the retention of write permissions is a net loss if the page is less likely to be written locally than read remotely. The absence of locality in the UNIFORM workload means that both of the tradeoffs are made by CB-A become net losses as more clients are added. Similar tradeoffs are made by the other avoidance-based algorithms. In addition to messages, the choice of an Invalid Access Prevention technique also has an impact on I/O requirements. Figure 11 shows the average hit rate across all client caches for the UNIFORM workload.14 As can be seen in the figure, the avoidance-based algorithms all have higher client cache hit rates than C2PL. In this experiment, the O2PL algorithms have inflated client buffer hit rates due to the reexecution of aborted 14 A cache request results in a hit only if a valid copy of the page id found in the local cache. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 352 • M. J. Franklin et al. Fig. 11. Client hit rate (UNIFORM, 25% client cache, slow net). transactions (as indicated in Figure 12). However, while CB-A has a nearly identical abort rate to C2PL, it has a noticeably better client hit rate. The reason for this difference is that size of the client caches under C2PL is effectively smaller than under the avoidance-based algorithms because of the presence of invalid pages. These invalid pages consume cache space that could otherwise be used for holding valid pages. In contrast, since the avoidance-based algorithms remove pages from client caches as they become invalid, they allow the entire cache to be used for valid pages. This effect is greatest in the range of 10 –15 clients here. Beyond this point, CB-A incurs an increasing rate of page invalidations due to the large number of clients. These invalidations reduce the extent to which CB-A is able to utilize the client caches; beyond a client population of 15, significant numbers of client cache slots simply remain empty under CB-A. The increased data contention of the UNIFORM workload also brings out the downside of the optimistic approach of deferring Write Intention Declarations. As can be seen in Figure 12, the semioptimistic O2PL-I algorithm aborts as many as 0.4 transactions for every transaction that it commits in this experiment. In comparison, the pessimistic CB-R algorithm aborts about one third as many transactions. Interestingly, despite this difference, O2PL-I obtains roughly 10% higher throughput than CB-A (see Figure 9). This is because the cost of aborts in this experiment is rather low due to cache hits that occur when aborted transactions run again. However, as shown in Franklin and Carey [1992] and Franklin [1993], the high abort ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency Fig. 12. • 353 Aborts/commit (UNIFORM, 25% client cache, slow net). rate of O2PL-I can cause it to have significantly worse performance than CB-R if data contention is further increased. In terms of the Write Permission Duration, the high degree of sharing and lack of locality of the UNIFORM workload results in CB-R sending fewer messages than CB-A across the range of client populations (Figure 10) and thereby having a slight performance advantage across most of the range (Figure 9). As discussed previously, retaining Write Permissions is a net loss for CB-A if a page is less likely to be written locally than it is to be read remotely. The lack of locality and the update probability in the UNIFORM workload thus work to the disadvantage of CB-A. The effects of using propagation for Remote Update Actions are similar here to those seen in the HOTCOLD workload. In this case, however, O2PL-P ultimately performs worse than even B2PL, which does no intertransaction caching. Although propagation does give O2PL-P a slight advantage in terms of the client hit rate (Figure 11), the cost of sending propagations that go unused is much higher here than the benefit gained from those propagations that are indeed eventually used. 5.3.4 The FEED Workload. The last workload to be examined here is the FEED workload. As discussed in Section 5.2.2, FEED is intended to model an information service environment, such as a stock quotation system, where many clients read data from an information source. In this workload, one client acts as the source, reading and updating pages, while the remainder of the clients act as consumers, only reading the data. We ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 354 • M. J. Franklin et al. Fig. 13. Throughput (FEED, 25% client cache, slow net). include this workload primarily to demonstrate a case where using propagation as the Remote Update Action can be beneficial; thus, we focus only on the performance of O2PL-I and O2PL-P here. Figure 13 shows the throughput results for O2PL-I and O2PL-P under this workload. The dashed lines show the throughput of the source (Client #1) while the solid lines show the aggregate throughput of the remaining clients. In this workload, O2PL-P significantly outperforms O2PL-I. The reason for this is that propagation gives the consumers a much higher client cache hit rate, as is shown in Figure 14. This improvement in hit rate reduces the path length of the reader transactions. Furthermore, due to the high degree of client access locality in this workload, many fewer propagations are wasted than in the UNIFORM workload. 5.4 Summarizing The Results In the preceding sections, we compared the performance of six different cache consistency maintenance algorithms. Here, we briefly review the results in terms of the insight that they offer regarding the design tradeoffs for transactional cache consistency maintenance algorithms. The tradeoffs between using avoidance and detection for Invalid Access Prevention were examined by comparing C2PL and CB-A. Both of these algorithms are pessimistic, so the avoidance/detection choice was seen to have a large impact on the number of messages sent. C2PL validates its cached pages prior to accessing them, and thus sends a round-trip message to the server on every initial access regardless of whether any read-write or write-write sharing is occurring. In contrast, CB-A, being avoidance-based, ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency Fig. 14. • 355 Client cache hit %—readers only (FEED, 25% client cache, slow net). is able to read its cached data without contacting the server. CB-A, however, was seen to be sensitive to the type and degree of sharing in the system, as increased data sharing results in additional callback and downgrade messages. These results show that if detection is to be used, it must be done in an optimistic manner. Optimistic detection-based algorithms are discussed in Section 6.1. In addition to message count differences, the results for the UNIFORM workload demonstrated that the choice of an Invalid Access Prevention method can have an impact on I/O requirements and on the volume of data transferred. This is because detection-based approaches allow out-of-date pages to remain in client caches, reducing the effective size of those caches. Detection-based approaches can be extended with notification “hints” to help reduce those costs. The trade-off between synchronous and deferred Write Intention Declaration (examined by comparing CB-R and O2PL-I) is a tradeoff between pessimism and optimism, affecting both the number of messages required to complete transactions and the transaction abort rate. With no read– write or write-write sharing (e.g., under the PRIVATE workload), the approaches were seen to be roughly equal in performance. If sharing is present, then deferring declarations can save messages. If, however, sharing increases to the point where data contention arises, deferring declarations can lead to significantly higher abort rates; transaction aborts can result in higher resource utilization due to lost work [Franklin and Carey 1992; Franklin 1993], though this effect did not significantly hurt throughput in the workloads examined here. Furthermore, a high abort rate may be intolerable for users in some highly-interactive applications. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 356 • M. J. Franklin et al. The choice of Write Permission Duration was examined by comparing the performance of CB-R, which retains write permissions only for the duration of a single transaction, and CB-A, which retains write permissions across transaction boundaries. The impact of this tradeoff is on the number of messages sent. In general, retaining write permissions is a net win if a page is more likely to be updated at the client that holds the permissions than to be read at another client. Thus, CB-A was seen to have a slight advantage over CB-R in the PRIVATE workload, while it had a slight disadvantage in the other workloads examined. The fourth design choice analyzed was the choice of Remote Update Action, as demonstrated by O2PL-I versus O2PL-P. In the PRIVATE workload, this choice has no effect because there is never a need for a remote update action. The other workloads, however, demonstrated that this choice can have a dramatic impact on performance. In the majority of the cases, invalidation was seen to be the right choice. Propagation was shown to be dangerously sensitive to the level of sharing present in the workload, and hence to the client cache size—with larger caches, the potential for sequential sharing across clients increases. In contrast, invalidation was seen to be much more stable in its performance characteristics. The FEED workload, however, demonstrated that there are indeed cases where propagation can be useful. Based on these results, we have developed and investigated dynamic algorithms that can adaptively choose between invalidation and propagation on a page-by-page basis at each client. These algorithms are discussed briefly in Section 6.4. 6. OTHER PROPOSED ALGORITHMS As explained earlier, the performance experiments described in this article have covered only a portion of the large design space available for cache consistency maintenance. In particular, our work has focused on algorithms that lie on the avoidance-based side of the taxonomy in Section 4. The design decisions for detection-based algorithms, however, each have avoidance-based analogs. In the following, we apply the insight gained from our experiments to the other published algorithms that appear in the taxonomy. 6.1 Optimistic Detection-Based Algorithms The first published paper to analyze transactional cache consistency algorithms for client-server OODBMSs was Wilkinson and Neimat [1990]. In that paper, two algorithms were proposed and studied. One algorithm, called Cache Locks, is a detection-based algorithm that defers validation of transactions until commit time. Special lock modes and long-running “envelope transactions” are used to determine when transactions have accessed stale data. Cache Locks is an optimistic algorithm in the sense that lock requests are not sent to the server during transaction execution. At commit time, a transaction sends its read and write sets to the server, and the server attempts to obtain the necessary shared and exclusive locks. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency • 357 Special lock modes for locks held on cached copies by envelope transactions indicate whether or not the copies accessed by the transaction were valid. If it is determined that the transaction accessed stale cached data, then it is aborted. In order to reduce the likelihood of aborts due to stale data, the server piggybacks notifications about modified pages on its replies to client requests. These notifications inform the client that it should mark its copies of the affected data as invalid (hence, it is an invalidation-based algorithm) and abort any ongoing transactions that have accessed those data items. More recently, an optimistic algorithm with notifications has been proposed for the Thor system at MIT [Adya et al. 1995]. This algorithm, called Adaptive Optimistic Concurrency Control (AOCC), is similar to the Cache Locks algorithm; it also includes support for transactions that access data from multiple servers (which is beyond the scope of both [Wilkinson and Neimat 1990] and this article). Rather than using lock modes to represent invalid cached copies, AOCC maintains an invalid set for each client in order to keep track of which copies of the data items cached at a client have been made invalid. As described in Adya et al. [1995], AOCC uses a combination of invalidation and propagation for Remote Update Actions. As in Cache Locks, notifications are piggybacked on messages sent to clients, and such notifications invalidate cached copies. However, when a transaction is aborted due to a detected inconsistency, AOCC immediately piggybacks new copies (i.e., propagations) of the invalid items on the abort acknowledgment that it sends to the client. As discussed in the previous section, a potential problem with detectionbased policies is an increase in communication due to the need to check page validity with the server. The comparisons of C2PL and CB-A showed that this cost can be significant if a pessimistic (i.e., synchronous on each initial access) approach to validity checking is used. For this reason, both Cache Locks and AOCC use the more optimistic approach of deferring such checks until transaction commit time. As shown in Adya et al. [1995], such optimism, combined with piggybacking of notifications, can significantly reduce the number of messages required for consistency checking; of course, this comes at the expense of a possible rise in transaction aborts. Whether such a trade-off is beneficial depends on several factors including the level of contention in the workload, the cost of wasted work due to aborts, and the tolerance of the application to a higher abort rate. One major difference between Cache Locks and AOCC is that Cache Locks uses invalidation for remote update actions while AOCC uses a combination of invalidation (in the absence of concurrent conflicts) and propagation (when a conflict has caused an abort). These propagations can be very useful in reducing the costs of transaction aborts as long as the aborted transactions are restarted immediately and tend to access the same items as they did in their previous incarnation(s); otherwise, inefficiencies that were identified for propagation in Section 5.3 may be incurred by this approach as well. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 358 • M. J. Franklin et al. 6.2 Notify Locks The second algorithm proposed in Wilkinson and Neimat [1990], Notify Locks, is an avoidance-based algorithm. It is similar to the O2PL-P algorithm described previously in that it defers Write Intention Declaration until the end of transaction execution and uses propagation for remote update actions. When a transaction wishes to commit, it sends copies of the updated data items back to the server. The server then sends notification messages to any clients that hold copies of the updated items; these messages contain the new values of those items. A major difference between Notify Locks and O2PL-P is that with Notify Locks, the arrival of a notification preempts any ongoing transactions that have accessed the changed items. In contrast, O2PL-P blocks notification requests that conflict with read locks held by ongoing transactions. Because of the preemption approach used by Notify Locks, committing a transaction requires (sometimes multiple) handshakes between the client and the server to avoid race conditions at commit time. The performance tradeoffs between the wait and preempt policies, however, have not been addressed in this study. Of course, because Notify Locks uses propagation, it is clearly subject to the performance problems that we saw for O2PL-P. This effect was not detected in Wilkinson and Neimat [1990] because that study used a probabilistic cache model that assumed that cache hit probabilities were independent of cache size. 6.3 No-Wait Locking No-wait locking algorithms were studied in Wang and Rowe [1991]. Nowait algorithms are detection-based algorithms that try to hide the latency of validations at the server by performing validity checking asynchronously. As with all detection-based algorithms, transactions must abort if they are found to have accessed stale data. By initiating the validity checks before commit time, however, the window during which data can become invalid is shortened compared to Cache Locks and AOCC. As stated in Section 5.1.4, asynchrony does not reduce the total work required, and thus, will not improve performance in a highly utilized system (e.g., if the server is a bottleneck). The performance results of Wang and Rowe [1991] showed that an algorithm similar to CB-R typically performed as well as or better than No-Wait Locking. To reduce the possibility of stale data access, the No-Wait algorithm was extended in Wang and Rowe [1991] with a propagation-based notification hint scheme. The performance of this algorithm, called No-Wait Locking with Notifications, was then examined. The results of that study showed (as we did in Carey et al. [1991]) that the cost of propagations typically outweighs their potential benefits. An invalidation-based notification scheme could avoid this problem, but such a scheme was not studied in Wang and Rowe [1991]. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency • 359 6.4 Dynamic Optimistic Two-Phase Locking The two remaining algorithms shown in the taxonomy of Figures 2 and 3 are variants of O2PL that choose dynamically between invalidation and propagation on a page-by-page basis. The original dynamic algorithm (O2PL-Dynamic) was introduced in Carey et al. [1991]. This algorithm used a simple heuristic that would initially propagate an update to a remotely cached page copy, switching to invalidation the next time if the propagation went unused. An improved heuristic (called O2PL-NewDynamic), which initially favors invalidation over propagation, was described and studied in Franklin and Carey [1992] and Franklin [1993]. Those studies showed that by favoring invalidation, O2PL-NewDynamic was able to match the performance of O2PL-I in those workloads where it had the best performance (i.e., most workloads tested), and to approach the superior performance of O2PL-P in the FEED workload (which is the one case where that algorithm provided the best performance). 7. CONCLUSIONS In this article we began by describing the potential benefits of caching in client-server database systems based on the data-shipping approach. The introduction of caching raises the need for mechanisms to ensure that transaction semantics are not violated as a result of dynamic replication. We refer to such mechanisms as transactional cache consistency maintenance algorithms. We presented a taxonomy that describes the design space for such algorithms and showed how it encompasses the algorithms that have been proposed in the literature. Six algorithms, taken from three different families, were then described in more detail and analyzed. These algorithms were used to explore many of the tradeoffs inherent in the design choices of the taxonomy. The insight gained was then used to reflect upon the characteristics of other algorithms that appear in the taxonomy. The choice of avoidance versus detection for preventing invalid access was seen to have a significant impact on the number of messages sent for maintaining consistency. Under pessimistic-style approaches, avoidance typically sends far fewer messages than detection. As a result, most detection-based schemes that have been proposed employ optimistic techniques that defer consistency actions or perform them asynchronously. Such techniques reduce the number of messages sent at the expense of increasing the probability of transaction aborts. A secondary effect of the choice of invalid access prevention is that avoidance-based techniques are able to more efficiently use client caches, as they allow only valid data to reside in the caches. Efficient cache usage can reduce the number of pages that must be obtained from the server, saving messages, message volume, and possibly even server I/O. Several of the detection-based algorithms have been extended with notification hints that help remove invalid pages from client caches. These hints reduce the potential for aborts due to accessing invalid pages and help to ensure more efficient use of the caches. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 360 • M. J. Franklin et al. The choice between synchronous and deferred declaration of write intentions was seen to be a trade-off between the number of messages sent and the transaction abort rate. Deferring declarations introduces another form of optimism, which can reduce messages but may also increase aborts. A third design decision, the duration of write permissions, was examined using two variants of Callback Locking. The tradeoff lies in the number of messages sent, and is workload-dependent. In situations with high-locality and low data conflict rates, retaining write permissions across transaction boundaries was seen to save messages, while with low-locality and high data conflict rates retaining write permissions was shown to result in a net increase in messages sent. These observations indicate that a dynamic algorithm that can choose between these two strategies is likely to perform well. Finally, the choice between invalidating remote copies and propagating changes to them was investigated by comparing two variants of the Optimistic Two-Phase Locking approach. Invalidation was seen to be quite robust in the face of changes to a number of workload and configuration parameters. In contrast, propagation was shown to be dangerously sensitive to the level of sequential sharing and to the client cache sizes; however, it was also demonstrated to be beneficial in a workload meant to model an information dissemination environment. In the absence of a dynamic approach or detailed information about client access patterns, invalidation is clearly the safest choice for most situations. The work reported here has been extended in several ways. The extension of these performance results to client disk caching was investigated in Franklin et al. [1993]. Client disk caching raises additional problems, such as the relatively large size of disk-based caches (compared to memory caches) and the tradeoffs of accessing data from the local disk versus obtaining it from a server. Issues that arise when clients are allowed to obtain data from each other (in addition to servers) were studied in Franklin et al. [1992]. More recently, callback-style approaches have been extended to support multiple granularities of concurrency control and cache consistency [Carey et al. 1994; Chu and Winslett 1994]. Current trends in client-server database systems raise additional challenges that must be addressed as well. In particular, the merging of Relational and Object technologies requires systems that can efficiently support both the navigational style of data access assumed in this study and the query-oriented access typically associated with relational systems. The development of distributed database architectures that efficiently support both associative and navigational access is a major focus of our ongoing work. REFERENCES ADVE, S. AND GHARACHORLOO, K. 1995. Shared memory consistency models: A tutorial. Tech. Rep. 95/7 (Sept.), Digital Western Research Laboratory. ADYA, A., GRUBER, R., LISKOV, B., AND MAHESHWARI, U. 1995. Efficient optimistic concurrency control using loosely synchronized clocks. In Proceedings of the ACM SIGMOD Conference on Management of Data (San Jose, CA, May), 23–34. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency • 361 AGARAWAL, A., SIMONI, R., HENNESSY, J., AND HOROWITZ, M. 1988. An evaluation of directory schemes for cache coherence. In Proceedings 15th International Symposium on Computer Architecture (Honolulu, HI, June), 208 –289. ALSBERG, P. A. AND DAY, J. D. 1976. A principle for resilient sharing of distributed resources. In 2nd International Conference on Software Engineering. ARCHIBALD, J. AND BAER, J. 1986. Cache coherence protocols: Evaluation using a multiprocessor simulation model. ACM Trans. Comput. Syst. 4, 4 (Nov.), 273–298. BAKER, M. G., HARTMAN, J. H., KUPFER, M. D., SHIRRIFF, K. W., AND OUSTERHOUT, J. K. 1991. Measurements of a distributed file system. In Proceedings 13th ACM Symposium on Operating System Principles (Pacific Grove, CA, Oct.), 198 –212. BERNSTEIN, P. A., HADZILACOS, V., AND GOODMAN, N. 1987. Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading, MA. BHIDE, A. AND STONEBRAKER, M. 1988. An analysis of three transaction processing architectures. In Proceedings of the Conference on Very Large Data Bases (VLDB) (Los Angeles, CA, Aug.), 339 –350. BUTTERWORTH, P., OTIS, A., AND STEIN, J. 1991. The gemstone object database management system. Commun. ACM 34, 10 (Oct.), 64 –77. CAREY, M., DEWITT, D., AND NAUGHTON, J. 1993. The 007 benchmark. In Proceedings of the ACM SIGMOD Conference on Management of Data (Washington, DC, May), 12–21. CAREY, M., FRANKLIN, M., AND ZAHARIOUDAKIS, M. 1994. Fine-grained sharing in a page server OODBMS. In Proceedings of the ACM SIGMOD Conference on Management of Data (Minneapolis, MI, May), 359 –370. CAREY, M. AND LIVNY, M. 1991. Conflict detection tradeoffs for replicated data. ACM Trans. Database Syst. 16, 4 (Dec.), 703–746. CAREY, M. J., DEWITT, D. J., FRANKLIN, M. J., HALL, N. E., MCAULIFFE, M. L., NAUGHTON, J. F., SCHUH, D. T., SOLOMON, M. H., TAN, C. K., TSATALOS, O. G., WHITE, S. J., AND ZWILLING, M. J. 1994. Shoring up persistent applications. In Proceedings of the ACM SIGMOD Conference on Management of Data (Minneapolis, MI, May), 383–394. CAREY, M. J., FRANKLIN, M. J., LIVNY, M., AND SHEKITA, E. J. 1991. Data caching tradeoffs in client-server DBMS architectures. In Proceedings of the ACM SIGMOD Conference on Management of Data (Denver, CO, May), 357–366. CATTELL, R. G. G. 1991. Object Data Management: Object-Oriented and Extended Relational Database Systems. Addison-Wesley, Reading, MA. CATTELL, R. G. G. AND SKEEN, J. 1992. Object operations benchmark. ACM Trans. Database Syst. 17, 1 (Mar.), 1–31. CHU, S. AND WINSLETT, M. 1994. Minipage locking support for page-server database management systems. In Proceedings 3rd International Conference on Information and Knowledge Management (Gaithersburg, MD, Nov.). DAN, A., DIAS, D. M., AND YU, P. S. 1990. The effect of skewed data access on buffer hits and data contention in a data sharing environment. In Proceedings International Conference on Very Large Data Bases (Brisbane, Australia, Aug.), 419 – 431. DAN, A., DIAS, D. M., AND YU, P. S. 1991. Analytical modeling of a hierarchical buffer for a data sharing environment. In Proceedings 1991 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (San Diego, CA, May 21–24), 156 –167. IBM. DAN, A. AND YU, P. S. 1991. Performance comparisons of buffer coherency policies. In Proceedings 11th International Conference on Distributed Computing Systems (Arlington, TX, May). DAN, A. AND YU, P. S. 1992. Performance analysis of coherency control policies through lock retention. In Proceedings of the ACM SIGMOD Conference on Management of Data (San Diego, CA, June), 114 –123. DAVIDSON, S. B., GARCIA-MOLINA, H., AND SKEEN, D. 1985. Consistency in partitioned networks. ACM Comput. Surv. 17, 3 (Sept.), 341–370. DEWITT, D. J., FUTTERSACK, P., MAIER, D., AND VELEZ, F. 1990. A study of three alternative workstation-server architectures for object-oriented database systems. In Proceedings International Conference on Very Large Data Bases (Brisbane, Australia, Aug.), 107–121. ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. 362 • M. J. Franklin et al. DIAS, D., IYER, B., ROBINSON, J., AND YU, P. 1987. Design and analysis of integrated concurrency-coherency controls. In Proceedings International Conference on Very Large Data Bases (Brighton, England, Aug.), 463– 471. EXODUS PROJECT GROUP. 1993. Exodus storage manager architectural overview. Computer Sciences Dept., Univ. of Wisconsin-Madison. FRANKLIN, M. J. 1993. Caching and memory management in client-server database systems. Ph.D. thesis, Univ. of Wisconsin, Madison. (Published as Client Data Caching: A Foundation for High Performance Object Database Systems, Kluwer Academic Publishers, Boston, MA). FRANKLIN, M. J. AND CAREY, M. 1992. Client-server caching revisited. In Proceedings International Workshop on Distributed Object Management (Edmonton, Canada, May), 57–78. (Published as Distributed Object Management, Ozsu, Dayal, Vaduriez, Eds., Morgan Kaufmann, San Mateo, CA, 1994.). FRANKLIN, M. J., CAREY, M. J., AND LIVNY, M. 1992. Global memory management in client-server DBMS architectures. In Proceedings of the Conference on Very Large Data Bases (VLDB) (Vancouver, Canada), 596 – 609. FRANKLIN, M. J., CAREY, M. J., AND LIVNY, M. 1993. Local disk caching in client-server database systems. In Proceedings of the Conference on Very Large Data Bases (VLDB) (Dublin, Ireland, Aug.), 543–554. FRANKLIN, M. J., ZWILLING, M. J., TAN, C. K., CAREY, M. J., AND DEWITT, D. J. 1992. Crash recovery in client-server EXODUS. In Proceedings of the ACM SIGMOD Conference on Management of Data (San Diego, CA, June), 165–174. GHARACHORLOO, K., LENOSKI, D., LAUDON, J., GIBBONS, P., GUPTA, A., AND HENESSY, J. L. 1990. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings 17th International Symposium on Computer Architecture (June), 15–26. GOODMAN, J. R. 1983. Using cache memory to reduce processor-memory traffic. In Proceedings 10th ACM Symposium on Computer Architecture (Stockholm, Sweden, June), 124. GRAY, J. AND REUTER, A. 1993. Transaction Processing: Concepts and Techniques. MorganKaufmann Publishers, San Mateo, CA. HOWARD, J. H., KAZAR, M. L., MENEES, S. G., NICHOLS, D. A., SATYANARAYANAN, M., SIDEBOTHAM, R. N., AND WEST, M. J. 1988. Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6, 1 (Feb.), 51– 81. JOSHI, A. M. 1991. Adaptive locking strategies in a multi-node data sharing model environment. In Proceedings International Conference on Very Large Data Bases (Barcelona, Spain, Sept.), 181–191. KELEHER, P., COX, A., AND ZWAENEPOEL, W. 1992. Lazy release consistency for software distributed memory. In Proceedings 19th Annual International Symposium on Computer Architecture (May), 13–21. KEMPER, A. AND KOSSMANN, D. 1994. Dual-buffering strategies in object bases. In Proceedings of the Conference on Very Large Data Bases (VLDB) (Santiago, Chile, Sept.), 427– 438. KIM, W., GARZA, J. F., BALLOU, N., AND WOELK, D. 1990. Architecture of the ORION next-generation database system. IEEE Trans. Knowl. Data Eng. 2, 1 (Mar.), 109 –124. KISTLER, J. J. AND SATYANARAYANAN, M. 1991. Disconnected operation in the coda file system. In Proceedings Thirteenth ACM Symposium on Operating System Principles (Pacific Grove, CA, Oct.), 213–225. LAMB, C., LANDIS, G., ORENSTEIN, J., AND WEINREB, D. 1991. The ObjectStore database system. Commun. ACM 34, 10, 50 – 63. LAMPORT, L. 1979. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. C-28, 9, 690 – 691. LEVY, E. AND SILBERSCHATZ, A. 1990. Distributed file systems: Concepts and examples. ACM Comput. Surv. 22, 3 (Dec.), 321–374. LI, K. AND HUDAK, P. 1989. Memory coherence in shared virtual memory systems. ACM Trans. Comput. Syst. 7, 4 (Nov.), 321–359. LILJA, D. J. 1993. Cache coherence in large-scale shared-memory multiprocessors: Issues and comparisons. ACM Comput. Surv. 25, 3 (Sept.), 303–338. LISKOV, B., DAY, M., AND SHRIRA, L. 1992. Distributed object management in Thor. In Proceedings International Workshop on Distributed Object Management (Edmonton, CanACM Transactions on Database Systems, Vol. 22, No. 3, September 1997. Cache Consistency • 363 ada, May), 79 –91. (Published as Distributed Object Management, Ozsu, Dayal, Vaduriez, Eds., Morgan Kaufmann, San Mateo, CA, 1994.). LIVNY, M. 1990. Denet user’s guide, version 1.5. Computer Sciences Dept., University of Wisconsin-Madison. LOMET, D. 1994. Private locking and distributed cache management. In Proceedings of the 3rd International Conference on Parallel and Distributed Information Systems (Austin, TX, Sept.), 151–159. MOHAN, C. AND NARANG, I. 1991. Recovery and coherency-control protocols for fast intersystem transfer and fine-granularity locking in a shared disks transaction environment. In Proceedings International Conference on Very Large Data Bases (Barcelona, Spain, Sept.), 193–207. NELSON, M. N., WELCH, B. B., AND OUSTERHOUT, J. K. 1988. Caching in the sprite network file system. ACM Trans. Comput. Syst. 6, 1 (Feb.), 134 –154. NITZBERG, B. AND LO, V. 1991. Distributed shared memory: A survey of issues and algorithms. IEEE Comput. 24, 8 (Aug.), 52– 60. O. DEUX ET AL. 1991. The O2 system. Commun. ACM, Special Section on Next-Generation Database Systems 34, 10 (Oct.), 34 – 49. OBJECTIVITY INC. 1991. Objectivity/DB Documentation V 1. ONTOS INC. 1992. Ontos DB 2.2 Reference Manual. RAHM, E. 1991. Concurrency and coherency control in database sharing systems. Tech. Rep. 3/91 (Nov.), Computer Science Dept., Univ. of Kaiserslautern, Germany. RAHM, E. 1993. Empirical performance evaluation of concurrency and coherency control protocols for database sharing systems. ACM Trans. Database Syst. 18, 2 (June), 333–377. RAMAKRISHNAN, K. K., BISWAS, P., AND KAREDLA, R. 1992. Analysis of file I/O traces in commercial computing environments. In Proceedings 1992 ACM SIGMETRICS and PERFORMANCE ’92 International Conference on Measurement and Modeling of Computer Systems (Newport, RI, June 1–5), 78 –90. SANDHU, H. AND ZHOU, S. 1992. Cluster-based file replication in large-scale distributed systems. In Proceedings 1992 ACM SIGMETRICS and PERFORMANCE ’92 International Conference on Measurement and Modeling of Computer Systems (Newport, RI, June 1–5), 91–102. STENSTROM, P. 1990. A survey of cache coherence protocols for multiprocessors. IEEE Comput. 23, 6 (June), 12–24. STONEBRAKER, M. 1979. Concurrency control and consistency of multiple copies of data in distributed INGRES. IEEE Trans. Softw. Eng. 5, 3 (May), 188 –194. STURGIS, H., MITCHELL, J., AND ISRAEL, J. 1980. Issues in the design and use of a distributed file system. Oper. Syst. Rev. 14, 3 (July). VERSANT OBJECT TECHNOLOGY. 1991. VERSANT system reference manual, release 1.6. WANG, Y. AND ROWE, L. A. 1991. Cache consistency and concurrency control in a client/ server DBMS architecture. In Proceedings of the ACM SIGMOD Conference on Management of Data (Denver, CO, May), 367–377. WILKINSON, K. AND NEIMAT, M.-A. 1990. Maintaining consistency of client-cached data. In Proceedings of the Conference on Very Large Data Bases (VLDB) (Brisbane, Australia), 122–134. Received October 1995; revised August 1996; accepted November 1996 ACM Transactions on Database Systems, Vol. 22, No. 3, September 1997.
Keep reading this paper — and 50 million others — with a free Academia account
Used by leading Academics
Kasun Jinasena
University of Sri Jayewardenepura
Dr. Shadab Alam
Jazan University
Paul Tobin
Dublin Institute of Technology
Monish Chatterjee
University of Dayton