Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Collection Schemes for Distributed Garbage.

...Read more
COLLECTION SCHEMES FOR DISTRIBUTED GARBAGE * Saleh E Abdullahi, Eliot E Miranda and Graem A Ringwood Department of Computer Science Queen Mary and Westfield College University of London LONDON E1 4NS yakubu|eliot|gar @dcs.qmw.ac.uk Abstract: With the continued growth in interest in distributed systems, garbage collection is actively receiving attention by designers of distributed languages [Bal, 1990]. Distribution adds another dimension of complexity to an already complex problem. A comprehensive review and bibliography of distributed garbage collection literature up to 1992 is presented. As distributed collectors are largely based on nondistributed collectors these are first briefly reviewed. Emphasis is given to collectors which appeared since the last major review [Cohen, 1981]. Collectors are broadly classified as those that identify garbage directly and those that identify it indirectly. Distributed collectors are reviewed on the basis of the taxonomy drawn up for nondistributed collectors. 1.0 Introduction Garbage collection is a necessary evil of computer languages which employ dynamic data structures. Abstractly, the state of a computation expressed in such languages can be understood as a rooted, connected, directed graph. Some edges, roots, are distinguished in that they provide entry points into the graph. The vertices of the computation graph are represented by cells , the units of allocation and deallocation of contiguous segments of store. (Nothing will be assumed about the sizes of cells.) Edges of the graph are represented by pointer fields within cells. Roots are pointers to vertices from the execution stack, global variables or registers. As a computation proceeds the graph changes by the addition and deletion of vertices and edges. As a result, some * This paper was presented at the International Workshop on Memory Management (IWMM92) St. Malo, France, September 1992. It appeared in the proceedings published by Springer-Verlag in LNCS 637 pages 43-81
portions of the graph become disconnected. These disconnected subgraphs are known as garbage. global stack Roots cell References within cells Garbage (Pointer fields) registers Fig 1. A representative, though small, state of a computation. Without reutilization, the finite store available for allocating new vertices diminishes to zero. The process by which the store occupied by discarded cells can be reutilized is called garbage collection. The earliest forms of store management placed the responsibility for allocation and reclamation on the programmer. Today this is considered too errorprone if not burdensome and a wide variety of languages provide automatic allocation and reclamation as part of their runtime system. Recent reports for various languages are: Smalltalk [Krasner, 1983; Ungar, 1984; Caudill, 1986; Miranda, 1987]; Prolog [Appleby et al, 1988]; ML [Li, 1990]; C++ [Bartlett, 1990; Detlefs, 1990a; Edelson and Pohl, 1990], Modula-2+ [DeTreville, 1990; Juul, 1990] and Modula-3 [Hudson and Diwan, 1990]. An important addition to the terminology of garbage collection was introduced by Dijkstra [1978]. The process which adds new vertices and adds and deletes edges is called the mutator . The mutator is an abstraction of the running program. The process which reclaims garbage is called the collector. Historically, the major disadvantage of automatic collection was that it significantly detracted from the performance of the mutator, both by introducing unpredictable, long pauses and using large proportions of available processing cycles. Measurements of early Smalltalk-80 implementations indicate that 20% to 70% of the time was spent collecting
COLLECTION SCHEMES FOR DISTRIBUTED GARBAGE * Saleh E Abdullahi, Eliot E Miranda and Graem A Ringwood Department of Computer Science Queen Mary and Westfield College University of London LONDON E1 4NS yakubu|eliot|gar @dcs.qmw.ac.uk Abstract: With the continued growth in interest in distributed systems, garbage collection is actively receiving attention by designers of distributed languages [Bal, 1990]. Distribution adds another dimension of complexity to an already complex problem. A comprehensive review and bibliography of distributed garbage collection literature up to 1992 is presented. As distributed collectors are largely based on nondistributed collectors these are first briefly reviewed. Emphasis is given to collectors which appeared since the last major review [Cohen, 1981]. Collectors are broadly classified as those that identify garbage directly and those that identify it indirectly. Distributed collectors are reviewed on the basis of the taxonomy drawn up for nondistributed collectors. 1.0 Introduction Garbage collection is a necessary evil of computer languages which employ dynamic data structures. Abstractly, the state of a computation expressed in such languages can be understood as a rooted, connected, directed graph. Some edges, roots, are distinguished in that they provide entry points into the graph. The vertices of the computation graph are represented by cells, the units of allocation and deallocation of contiguous segments of store. (Nothing will be assumed about the sizes of cells.) Edges of the graph are represented by pointer fields within cells. Roots are pointers to vertices from the execution stack, global variables or registers. As a computation proceeds the graph changes by the addition and deletion of vertices and edges. As a result, some * This paper was presented at the International Workshop on Memory Management (IWMM92) St. Malo, France, September 1992. It appeared in the proceedings published by Springer-Verlag in LNCS 637 pages 43-81 portions of the graph become disconnected. These disconnected subgraphs are known as garbage. cell stack Roots registers global References within cells (Pointer fields) r Ga ba ge Fig 1. A representative, though small, state of a computation. Without reutilization, the finite store available for allocating new vertices diminishes to zero. The process by which the store occupied by discarded cells can be reutilized is called garbage collection. The earliest forms of store management placed the responsibility for allocation and reclamation on the programmer. Today this is considered too errorprone if not burdensome and a wide variety of languages provide automatic allocation and reclamation as part of their runtime system. Recent reports for various languages are: Smalltalk [Krasner, 1983; Ungar, 1984; Caudill, 1986; Miranda, 1987]; Prolog [Appleby et al, 1988]; ML [Li, 1990]; C++ [Bartlett, 1990; Detlefs, 1990a; Edelson and Pohl, 1990], Modula-2+ [DeTreville, 1990; Juul, 1990] and Modula-3 [Hudson and Diwan, 1990]. An important addition to the terminology of garbage collection was introduced by Dijkstra [1978]. The process which adds new vertices and adds and deletes edges is called the mutator. The mutator is an abstraction of the running program. The process which reclaims garbage is called the collector. Historically, the major disadvantage of automatic collection was that it significantly detracted from the performance of the mutator, both by introducing unpredictable, long pauses and using large proportions of available processing cycles. Measurements of early Smalltalk-80 implementations indicate that 20% to 70% of the time was spent collecting garbage [Krasner, 1983]. For Lisp, collection overheads of between 10% and 40% were reported [Steele, 1975; Wadler, 1976], with pause times of 4.5 seconds every 79 seconds [Foderaro, 1981]. Over the previous decade much progress has been made; and the current state-of-the-art for Smalltalk-80 is less than 5% collector overhead, with typically better than 100 millisecond pause-times [Ungar, 1992]. Efficient garbage collection is so useful and so difficult to make unobtrusive that it has been a field of active research for over three decades. It constitutes a major concern for language designers. Knuth [1973] invented some and analysed other collectors which appeared prior to 1968. Cohen [1981] performed a public service with a survey of papers up to 1981. While there have since been numerous papers on garbage collection, they have tended to be language specific. Some languages allow optimizations which are not generally applicable. The semantics of a language does restrict the topology of the computation graph and graphs may be: cyclic; acyclic or tree-like. The topology in turn restricts the type of collector which can be employed. A significant complication to the problem of garbage collection since Cohen [1981] has arisen with the spreading web of distributed systems [Bal, 1990]. According to Bal: "A distributed computing system consists of multiple autonomous processors, nodes, that do not share primary memory, but cooperate by sending messages over a communications network." The advantages of distribution are: - improved performance through parallelism; - increased availability and reliability through redundancy; - reduced communication by dispersion of processing power to where it is needed and - incremental growth through the addition of nodes and communication links. The convincing factor is the economic consequences to which these advantages give rise. While distributed applications can be built directly on top of operating systems, Bal [1990] puts forward convincing arguments for programming languages which contain all the necessary constructs for distributed programming. For such languages, the computation graph is distributed over a number of nodes. The absence of a homogeneous address space and the high cost of communication relative to local computation make distributed garbage collection a significantly more complex problem than collection on a single node. The purpose of this paper is to give as comprehensive as possible a review and bibliography of distributed garbage collectors, subject to space limitations. As distributed collectors are generally based on nondistributed collectors the latter are first briefly classified. Special attention is given to incremental and concurrent collectors which are directly relevant to distribution. Emphasis will be placed on papers and trends published since Cohen [1981]. The majority of these papers relate to object-oriented languages, but for the ideal of treating different languages uniformly, herein, objects will be referred to as cells. The final section reviews distributed collectors on the basis of the taxonomy drawn up for single node collectors. 2.0 Single Node Collectors Following Cohen [1981] the collection process consists of: 1) identification and 2) reclamation of garbage for reuse. The way in which garbage is identified distinguishes two classes of collectors. Garbage identification can be made directly, identifying cells that become disconnected from the computation graph or indirectly by identifying the cells forming the computation graph; what then remains must be garbage (and unallocated store). The form of reclamation is dependent on how the free store is managed. It can either be managed as a freelist (equally well a bitmap or buddy system) or a heap. If managed by a freelist, garbage is coalesced into the list. If managed as a heap, the division between the allocated and unallocated store is indicated by a single pointer, the top of heap, and reclamation can be performed either by compacting or by copying. Garbage Collection Reclamation Identification Direct Indirect Store Management Freelist Heap Copy Compact Coalesce Fig 2. The garbage collection problem. Various collectors have been proposed which seek to optimise different criteria. Some aim to minimize the total percentage time spent collecting garbage; some aim to minimize the period of time taken in any one invocation of the collector (to provide predictable performance for realtime or interactive programming); some aim to minimize the space overhead (the memory required to identify and collect garbage); some are concerned with localization which is important for the efficient use of virtual memory. The next section gives a brief survey developing a taxonomy in terms of the advantages and disadvantages of different species. 2.1 Direct identification of garbage Direct identification of garbage can be made using a reference count. In its simplest form, a cell holds a count of the number of references to it [Collins, 1960]. If as a result of a mutator operation the count falls to zero, the cell is garbage, since it can no longer be reached from a root. The collector can immediately reclaim the cell and recursively decrement the counts of its referents and reclaim those whose count also fall to zero. Naturally enough, this process is known as recursive freeing. A feature of reference counting is that garbage is reclaimed immediately it is identified. One of a number of disadvantages of reference counting is the space overhead of the count. It has been observed [Krasner, 1983] that the majority of cells have a small reference count. Consequently, the size of the count field of a cell is chosen to be smaller than is needed to represent all possible references. Typically, systems allocate one byte to hold the reference count. Once a count reaches the ceiling, saturation, it is not altered and no longer accurately reflects the number of references to a cell. To cheapen the test for saturation a count is saturated if the signed byte is negative, allowing the count to record from 0 to 127 references. Clark's measurements of LISP programs (see [Deutsch and Bobrow, 1976; Field and Harrison, 1988]) show that about 97% of list cells have a reference count of 1. This suggests an extreme form of saturation using a singlebit count [Friedman and Wise, 1977]. A clear bit is used to indicate a single reference to cell. When a second reference to the cell is created the bit is set. Once set the bit cannot be cleared because it cannot be determined, without great cost, if the cell has more than one reference. To reclaim cells that acquire more than one reference during their lifetime, it is necessary to employ a second type collector. Because of the predominance of single references, this collector will be invoked considerably less often than if it were used on its own. Singlebit reference counts are efficient and have the additional advantage in that they can be stored in cell pointers rather than the cell itself. Duplicating a pointer then does not require access to the cell to adjust the count. 2.2 Indirect identification of garbage A second disadvantage of reference counting is the difficulty it has with reclaiming circular structures. The reason for this is the locality of identification. It is expensive to determine if the destruction of one local pointer has disconnected a portion of the graph. A disconnected cyclic structure will have no vertices connecting it with the roots of the computation graph but each of its cells will have a nonzero reference count. Some reference counting schemes do exist that attempt to reclaim cyclic garbage, but they are tedious, complex [Friedman and Wise, 1979], lack generality [Bobrow, 1980] and have significant computational overhead [Brownbridge, 1985; Hughes, 1985; Rudalics, 1986; Watson, 1986]. The problem can be overcome by requiring that the programmer explicitly break cycles of references or, more typically, by supplementing reference counting with second collector that identifies garbage indirectly [Goldberg and Robson, 1983]. Collectors that identify garbage indirectly take a global aspect. Traversing the computation graph from the roots and visiting all vertices will identify those cells which are definitively not garbage. By default, the unvisited part of the store is garbage or unallocated. By such means, cyclically connected subgraphs which become disconnected are (indirectly) identified and can be collected. M a r k - a n d - s w e e p collectors postpone collection until the free store is exhausted. Mutation is then temporarily suspended. Identification and reclamation are treated as sequential phases. The first phase traverses the computation graph marking all accessible cells. In its simplest form a single markbit is sufficient to indicate whether or not a cell is pointed to by other cells reachable from a root. This markbit is comparable with a singlebit reference count. A difference is that for the markbit, those cells whose counts are equal to zero are declared garbage while the others are part of the computation graph. The marking phase concludes when all accessible cells have been marked. A sweep of the entire store reclaims the unmarked cells and clears the marked ones [McCarthy, 1960]. Singlebit reference counting is further distinguished from mark-and-sweep by the periods in which the bits holds accurate information. For mark-and-sweep the information is only consistent at the end of the sweep phase. For reference counting it is made consistent after every mutation. The free storage can be managed as a freelist or a heap. With heap management reclamation can be achieved by compaction. For fixed size cells, compaction can be performed by sweeping the heap twice [Cohen, 1967]. In the first pass, two pointers are used, one starting at the bottom of the heap, the other at the top. The pointer to the top of the heap scans down until it points to a marked cell. The pointer to the bottom of the heap scans up until it points to an unmarked cell. At this point, the contents of the marked cell are copied to the unmarked cell (assuming the cells are the same size.), the markbit cleared and forwarding pointer to the new cell placed in the old position. When the two pointers meet, all marked cells have been unmarked and compacted in the upper part of the heap. The second scan is needed for readjusting pointers to moved cells. Any cells that refer to cells in the compacted area are adjusted by following forwarding pointers. Martin [1982] combines the marking phase with a rearrangement of the pointers so that they can be moved more readily. Carsson, Mattsson and Bengtsson [1990] present a variation in which during the mark phase the pointer fields of the accessible cells (not the whole cells) are copied into a table and the cells are marked as visited. After sorting the addresses the reachable cells are compacted by sliding the cells to one end of the store. If the store is managed as a freelist and the computation graph contains cells of differing sizes, allocation will in general fragment the free store. When an allocation request is made, the free list may contain no free cells of the required size, but may contain cells larger than that required. Typically, the allocator will satisfy the request by splitting a larger cell into an allocated cell, and a remaining free fragment. Over time, the freelist becomes composed of smaller and smaller fragments. Eventually a situation occurs where no free cell is large enough to meet the allocation yet the total size of free space is sufficient. The allocation can be met by coalescing the fragments into a single, or at least larger, cells. This is done by compacting the cells forming the computation graph. Some systems use compaction as an independent storage management technique to backup another garbage collection scheme. For example, in BrouHaHa Smalltalk [Miranda, 1987] the allocator checks that the total size in free cells is sufficient and if so invokes the compactor. A markand-sweep garbage collector is used as a last resort if compaction would prove futile. 3.0 Incremental and Concurrent Collectors Section 2 identified three processes associated with garbage collection: mutation (M), identification (I) and reclamation (R). What distinguishes the majority of collectors up to [Cohen, 1981] is that these processes are sequenced. As reference counting reclaims garbage as soon as it is detected, mutation can be followed by cascades of IR operations as a result of recursive freeing. In contrast, indirect identification postpones collection until the free store is exhausted; only at the end of each MIR cycle is the store, generally, in a consistent state. As Ungar [1984] reported, Fateman found that mark-and-sweep takes up 25% to 40% of the computation time of Franz-Lisp programs. Wadler [1976] reported that typical Lisp programs spend from 10% to 30% of their time performing collection. As such, mark-and-sweep is unsuitable for reactive (interactive and realtime) applications, because even if the garbage collector goes into action infrequently, on such occasions as it does it requires large amounts of time. While reference counting is somewhat better in this respect because the grain size of the processes is smaller, a significant amount of time is spent in identification [Steel, 1975; Ungar, 1984]. Every mutator operation on a cell requires that the counts of its referents' be adjusted. Furthermore, significant time is spent in recursive freeing: 5% on Berkeley Smalltalk and 1.9% on Dorado Smalltalk implementations [Ungar, 1984]. Because recursive freeing is unbounded, the simple form of reference counting in which the collector immediately reclaims all the cells freed by a mutation is also unsuitable for reactive applications. 3.1. Deferred, direct collectors The overhead of immediate reference counting can be reduced by deferring recursive freeing. Using doubly linked freelist store management [Weizenbaum, 1962; 1963], a newly deallocated cell can be placed on the end of the freelist but its referents not immediately processed. This cell is considered for reuse when it advances to the head of the list. Only at this time are the counts of its referents decremented; any falling to zero are added to the end of the freelist. This deferred reference counting technique is time efficient and provides a smoother collection policy, one not so vulnerable to unbounded mutator delays of immediate reference counting. However, it is no longer true that after each MI operation all garbage has been identified let alone reclaimed. Collectors which, by design, do not necessarily identify and reclaim all garbage in a single invocation are said to be incremental. A similar scheme is that of Glaser and Thomson [Field and Harrison, 1988], which uses a to-be-decremented stack instead of a doubly-linked list. In this scheme cells are added to the to-be-decremented stack if they have a count of one which requires decrementing. When cells are allocated from the stack their count is already one, hence this scheme manages to elide many garbage identification operations. Deutsch and Bobrow [Deutsch, 1976] observe that, frequently, over a series of reference counting operations the net change in a cell's count will be small, if not nil. For example, when duplicating a cell reference as a stack parameter to a procedure call, the cell will acquire a reference that will be lost once the procedure returns. If adjusting such volatile references can be deferred, many garbage identification operations can be eliminated. Baden [1983], proposes such a scheme for Smalltalk-80 which was used by Miranda [1987]. References to cells from roots, such as the stack, are not included in a cells count. Instead, root reference to cells are recorded in the Zero Count Table (ZCT). If a reference to a new cell is pushed on the stack (the typical way by which new cells join the computation graph), it is placed in the ZCT since it has a zero count and is only referenced from the stack. When a nonroot reference counting operation causes a cells' count to fall to zero the cell is also placed in the ZCT because it might be referenced from a root. If the ZCT fills up or when no more free store is available, the collector initially attempts to reclaim cells in the ZCT. Firstly, reference counts are stabilized , made consistent, by increasing the count of all cells referred to from the roots. The ZCT is emptied by scanning and any referenced cell with a zero count is freed. Finally, the stack is scanned and the counts of all cells referred to from the stack are decremented. During this process any cells whose counts returns to zero are placed in the ZCT, since they are now only referenced from the stack. Using this technique, stack pushes and pops reduce to ordinary datamovement operations, that is, they can be made without identification operations. Baden’s measurements of a Smalltalk-80 system suggest that this method eliminates 90% of the reference count manipulations, and reduces the total time spent on reference counting by half [Baden, 1983]. A slight disadvantage is that sweeping the ZCT causes a pause in mutation, however typical pause times are of a few milliseconds [Miranda, 1987]. A further disadvantage is the extra storage required by the ZCT between reclamations. 3.2 Concurrent mark-and-sweep The major advantage of deferred reference counting is that garbage collection is fine grained and interleaved with mutation, making it suitable for interactive and realtime applications [Goldberg, 1983]. The major disadvantage of indirect identification is the long interruptions of the mutator by the collector. Dijkstra [1978] described a modification of mark-and-sweep in which the mutator and the collector operate concurrently. Put another way, the collector operates on-the-fly. It was in the context of this algorithm that the terminology mutator and collector processes was coined. In the simple mark-and-sweep scheme, of Section 2, concurrency is prevented by interference of identification by the mutator. If a reference to a new cell is added after the sweep has passed over it, the new cell will not be correctly identified as part of the computation graph. Dijkstra achieves a decoupling of the mutator from the collector by introducing a third state for a cell. The three states, referred to as colours: white (unmarked); black (marked) and gray, can be represented by two mark bits. The mutator prevents collection of a newly allocated white cell by turning it grey at the time of allocation. Marking blackens any cell traced from a root. Cells will be either black, grey or white. As previously, white cells are unreachable from the roots. Grey cells will be those allocated since the last collection but missed by during the marking phase. In the sweep phase white cells are reclaimed and other shades are whitened. Baker [1992] has recently proposed a realtime collector similar to Dijkstra's where any invocation of the collector is bounded in time. 3.3 Scavenging collectors The generality and modularity of mark-and-sweep account for the attention it has received in the past three decades. It can however be inefficient because of its global nature. The marking phase inspects all accessible cells while the sweeping phase traverses the whole store. The sweep time is proportional to the size of the store and in virtual memory systems, the collector may access numerous pages on secondary store, an inherently slow process. When the store is managed as a heap the costly sweep phase of the mark-andsweep collectors can be eliminated by combining the identification and collection phases. This requires two heaps, historically called semispaces [Baker, 1978]. The mutator begins operating in the fromspace. When there is no free space, the collector scavenges fromspace. A s c a v e n g e is a simultaneous traversal and copy of the computation graph from the fromspace to the tospace. This combination of copying and tree traversal has the added advantage of improving locality. When each cell is moved to tospace a forwarding pointer is left behind. After a scavenge, the fromspace becomes free, and can be reused. The two semispaces are flipped and the mutator continues. Baker's original scheme is also realtime. Collection is interleaved with mutation but any invocation of the collector is bounded. A consequence of this is that the mutator must handle forwarding pointers. If the mutator encounters a reference to a forwarding pointer it updates the reference, so avoiding subsequent forwarding. Scavenging schemes trade space for time since they require two heaps. Consequently they have much higher space overheads than either mark-andsweep or reference counting algorithms. 3.4 Generational scavengers Lieberman and Hewitt [1983] observed that most newly created cells die young, and that long-lived cells are typically very long-lived. Their collector segregates cells into generations, each with its own pair of semispaces. Each generation may be scavenged without disturbing older ones giving rise to incremental collection. Younger generations to be scavenged more frequently. The youngest generation will be filled most rapidly, but when flipping very few of its cells survive. This drastically reduces the amount of copying needed to maintain the generation. Generations can be created dynamically when the youngest generation fills up with cells that survive several flips. Ungar's [1984] generation scavenging collector exploits the same cell lifetime behaviour as Lieberman and Hewitt. This collector classifies cells as either new or old. Old cells reside in a region of memory called Old Space (OS). All old cells that reference new ones are members of the Remembered Set (RS). Cells are added to RS as a side effect of the mutator. Cells that no longer refer to new cells are removed from RS when scavenging. All new cells must be reachable from cells in RS. Thus, RS behaves as roots for new cells and any traversal of new cells can start from RS. Three heaps are used for new cells: new space (NS) (a large nursery heap where new cells are spawned); past survivor (PS) space (which holds new cells that have survived previous scavenges), and future survivor (FS) space (which remains empty while the mutator is in operation). A scavenge copies live new cells from NS and PS to FS space, and flips PS and FS. At the end of the scavenge, no live cells are left in NS and it can be reused. Cells that have survived more than a prescribed number of flips are moved to OS, a process called tenuring. With Ungar's collector the mutator is stopped during scavenging. This allows dispensing with forwarding pointers which achieves performance gains. While explicitly not concurrent, the collector is incremental because generations are small, pause times are short. By carefully tailoring the size of NS, FS and PS an implementation of Ungar's scheme for Smalltalk manages to keep scavenge times to a median of 150 milliseconds occurring every 16 seconds [Ungar, 1984]. Although generational collectors collect intragenerational cycles, they cannot collect intergenerational, cycles of references through more than one generation. Further, some schemes do not attempt to scavenge older generations. [Ungar 1984] leaves the reclamation of such garbage to offline reorganization, where a full garbage collection is done after the system has stopped. The current ParcPlace [1991] Smalltalk-80 generational garbage collector is backed up by an incremental collector, a mark-and-sweep collector, and a compactor which garbage collects OS. Although generation collectors are one of the most promising collection techniques, they suffer poor performance if many cells live a fairly long time, the so-called premature tenuring problem. Ungar and Jackson propose an adaptive tenuring scheme based on extensive measurements of real Smalltalk runs [Ungar, 1988; 1992]. This scheme varies the tenuring threshold depending on dynamically measured cell lifetimes. It also proposes a refinement that has been included in the ParcPlace [1991] collector. In systems like Smalltalk, interactive response is at a premium but the system contains many large cells that don't contain references to other cells, mainly bitmaps and strings. To avoid copying these cells they are segregated in a LargeCellSpace, and tenured to OS when necessary. A generational scavenging collector that adapts to the allocation patterns of applications was recently presented by Hudson and Diwan [1990]. This generational scavenging collector has a variable number of fixed size (power of 2) generations. The generations are placed in store at contiguous addresses. The generation number is apparent from the most significant address bits. Each generation has its own tospace, fromspace, and RS (remembered set). RS is fed indirectly via a buffer containing addresses of possible intergenerational pointers. The feeder may filter out duplicates, intragenerational pointers, and nonpointers. When scavenging more cells than a generation can accommodate, a new generation is inserted. To retain the ordering, the younger generations are shuffled backwards during scavenging. Other generation-based collectors include: opportunistic collectors [Wilson and Moher, 1989]; ephemeral collectors and the Tektronix Smalltalk collector. In terms of usage, all three commercial U.S. Smalltalk systems (DigiTalk, Tektronix and ParcPlace systems) have adopted generational automatic storage reclamation [Ungar and Jackson, 1988]. The SML NJ compiler [Wilson, 1992] also uses a generational collector. Deimer et al [1990] have investigated a generational scheme combined with a conservative mark-and-sweep garbage collector designed for use with Scheme, Mesa and C intermixed in one virtual memory. Wilson, Lam and Moher [1990] show that, typically, generational garbage collectors have poor locality of reference, but careful attention to memory hierarchy issues greatly improves performance. They attributed the small success recorded by several researchers in their attempts to improve locality in heaps to two flaws in the traversal algorithms. They failed to group data structures in a manner reflecting their hierarchical organization, and more importantly, they ignored the disastrous grouping effects caused by reaching data structures from a linear traversal of hash tables (i.e. in pseudo-random order). Incremental collectors that copy cells when the mutator addresses them have also been looked at by White [1980] and Kolodner [Kolodner et al, 1989; Kolodner, 1991]. These reorder cells in the order they are likely to be accessed in the future, giving improved locality. However, the technique requires special hardware. Other reordering optimizations that don't require special hardware work by reordering pages within larger units of disk transfer [Wilson, 1992]. 4.0 Distributed Collectors Following Hudak and Keller [1982] distributed collectors are characterized by: i) a set of nodes; comprising any number of processors sharing a single address space; ii) connected by a communication network; iii) where each node holds a portion of the computation graph and iv) each node has at least one mutator. In distributed systems, processing is distributed over all nodes. Each node has direct access only to cells that reside in its local heap. A reference to a cell in the same node is said to be local. A reference to a cell on another node is said to be remote. Access to a remote cell is achieved by sending a message to the node that holds it, which then performs any necessary operation. The issues of distributed garbage collection are very much the issues of distribution: i) concurrency, communication and synchronization; ii) communication overheads; iii) messages may be lost, delivered out of order or duplicated; iv) fault tolerance. After discussing the effects of distribution on the computation graph the following sections present various distributed collectors based on the previous taxonomy. The final section addresses fault tolerance issues. Table 1 summarizes the main characteristics of the collectors described. 4.1 Distributed computation graphs To exploit the parallelism of a distributed system, the computation graph has to be distributed over all nodes. The vertices of the graph are naturally partitioned according to physical distribution, but there is no principle that prevents a cell migrating between nodes. Each node could contain roots of the graph but it is more usual that the roots lie on the node on which the computation was initiated. A remote reference is necessarily indirect. It first references a local export record. The export record references an entry record on a remote node. In turn, the entry record directly references the remote cell. The import and export records might naturally be grouped in tables but the export record could equally well be a proxy cell. The triple indirection causes some overhead for a remote reference which adds another dimension to the problem of nonlocality. The entry table acts as additional local roots for the local partition of the computation graph. The local roots and the entry table will allow the local part of a graph to be collected independently. Given the potential parallelism, incremental and concurrent collectors appear the most appropriate for distributed systems. The problem of collection, then, naturally decomposes into the problem of local collection and global collection of the entry and exit tables. Further tables may be used to record the cells they reference remotely. ElHabbash, Horn and Harris [1990] use an additional private table. The private table provides location independent addressing. Storage is partitioned into clusters, each with its own set of tables. A cluster is a logical partition of cells (a passive node) in contrast to the natural physical partition (of active nodes). A cluster is a group of cells which are expected to form a locality set. Cells in the cluster reference other clusters via defined ports. The import table gives a location hint about each external cell referenced from the cluster. The export table is the entry point for the public cells in the cluster which can be externally referenced. Public cells in the cluster are given unique public identifiers (PIDs). Private cells are not known outside the cluster and can only be referenced by the cells in the same cluster. The private cells are given local identifiers (LIDs), which are, in fact, private table entries in the cluster. Clusters are the unit of management, the objective being to increase the locality of reference within a cluster. Removing nonreferenced cells from a cluster is considered a contribution to increasing the locality of reference of the cluster. Subgraphs which are only reachable from the export table may be removed to that cluster's archival cluster. Whenever an archived cell is referenced from any cluster, that cell and its subgraph are moved into the cluster. In this way, cells may migrate from cluster to cluster, via archival clusters. Archived cells which are not referenced from any cluster will remain in the archival cluster. Starting from the roots in the cluster, and traversing the subgraphs rooted at them, any cells connected in these graphs must remain in the cluster. The other cells which are not reachable from the roots are moved away to maintain a high locality of reference in the cluster. Nonreachable public cells in the cluster cannot be considered as garbage because they may be referenced from other clusters, but on the other hand they are not part of the locality in the cluster. The private cells which are not reached from any public cells (roots or nonroots) in the cluster are definitely garbage, and can be reclaimed. Archival collection is controlled by setting time limits. A similar approach is used by Moss [1990] in the Mneme project. Mneme structures the heap of cells into files. A file has a set of persistent roots and contains a collection of cells that can refer to each other using short cell identifiers. Cells in one file can refer to cells in other files via a device called a forwarder. A forwarder is a local standin or proxy for a cell in another file. Thus, to refer to a cell in another file, one refers to a local cell marked as a forwarder; the forwarder can contain arbitrary information about how to locate the cell at the other end. Each file can be garbage collected independently. Moss calls the import table the incoming reference table (IRT). Both the Moss and El-Habbash collectors are intended for use in a persistent environment. 4.2 Distributed direct identification of garbage The locality of identification in reference counting has a number of attractive consequences for distributed systems. The collector visits cells only when the mutator does. Cells can be reclaimed locally as soon as they become inaccessible. One of the earliest distributed reference counting collectors performs all of the reference counting operations by spawning remote asynchronous tasks on appropriate processors [Hudak and Keller, 1982]. This ensures that actions are atomic. The nontrivial part of the adaptation is to guarantee that indentification operations (increment and decrement reference counts) are executed in the order they were generated. If this were not the case, a reference count may prematurely reach zero. Simple remote reference counting requires synchronization of communication between cooperating nodes. Lermen and Maurer [1986] ignore part of the problem by assuming that the underlying communication protocol preserves the order of messages. The assumption can be enforced if either the system provides fixed routing or provides a message protocol that indicates the order in which they are sent. An extension of reference counting which eliminates both synchronization and the need to preserve the order of messages is weighted reference counting (WRC). It was developed independently by Thomas [1981], Watson and Watson [1987] and Bevan [1987]. The idea is that each cell is allocated a standard reference count when created and at all subsequent times the sum of weights on the pointers to a cell is equal to the reference count. A reference with a weight W is equivalent to W references each with a weight 1. When a reference is duplicated it is unnecessary to access the cell. Rather, the weight of the pointer is equally divided between itself and the copy. The sum of the weights then remains unchanged. In this respect, WRC can be understood as a generalization of singlebit reference counting when the bit is located with the pointer. The advantage for distribution, is that no communication is required when a remote reference is copied. When a reference is destroyed, however, the pointer weight must be decremented from the reference count of the cell in order to preserve the rule that sum of the weights must equal the reference count. As usual, if a cell's count falls to zero it can be reclaimed. Because the reference weight is always a power of two to allow for duplication, the log of the weight can be stored instead of the whole weight. This provides an important reduction in the space requirement for each reference. However, when a weight is to be subtracted from a count it must be converted (by shifting). Indirection is used to handle underflow which occurs when a reference weight of one needs to be copied. An unfortunate consequence of indirection is that a reference, its indirection and the cell to which it refers may reside on different nodes. In this case, accessing a cell requires additional messages. Generational reference counting (GRC), Benjamin [1989] solves this problem. Each reference is associated with a generation. Each cell is initially given a zero generation reference, any copy of an ith generation reference is an (i+1)th generation reference. Each cell has a table, called a ledger, which keeps track of the number of outstanding references from each generation. If a cell's ledger has no outstanding references from any generation, then the cell is garbage and its space can be reclaimed. GRC has a significantly lower communication overhead but greater computational and space requirements than ordinary reference counting. Its communication overhead is similar to WRC, namely one acknowledged message for each copy of an interprocessor reference and a corresponding extra space associated with each reference. Vestal [1987] describes a collector that uses a distributed fault tolerant reference counter. Each cell maintains a conservative list of sites referencing it. Each site of this list keeps the count of references it has for that cell. Atomic update of the list is required when a site first references a cell. The cycle-detection algorithm is seeded with some cell suspected of being part of a dead cycle. The algorithm essentially consists of trial deletion of the seed and checking if this brings all the counts in the cycle to zero. 4.3 Distributed indirect collectors One of the first distributed indirect identification collectors was the markingtree collector, [Hudak and Keller, 1982]. It is an adaptation to a distributed environment of the previously described Dijkstra [1978] concurrent mark-andsweep. Each mutator and collector on each node has its own task-queue. Each task locks all cells it intends to access to prevent race conditions. To prevent deadlock, if a task finds that some cell was already locked all locked cells are released and the task requeued. Since cells involved in a task may reside on different processors, this locking mechanism introduces high processing time and communication overhead when the collector and the mutator have high degrees of contention to shared cells. There is a single root of the whole distributed graph. The collector collects one node after another beginning with the root node. It can reclaim all garbage including cycles. The markingtree collector operates in a functional graph reduction environment and need not handle arbitrary pointer manipulation. Because it does not batch remote mark tasks, it imposes high message traffic. Space needed for storing these requests cannot be determined in advance. Similar mark-and-sweep collectors also inspired by Dijkstra's parallel collector were described by Augusteijn [1987] and Vestal [1987]. All processors cooperate in both phases of the collection but marking can proceed in parallel with mutation. In Vestal's [1987] collector, the cell space is split into logical areas in which parallel collection may occur. Areas are a logical grouping of cells, and there is no control over site boundary crossing. The space overhead is proportional to the number of cells and to the number of areas, since each cell maintains an array of four colours for each existing area in the system. This collector does not take advantage of locality: each collector performs a global transitive closure starting at the root of one area, hence crossing boundaries. Mohammed-Ali [1984], Hughes [1985] and Couvert [see Shapiro et al, 1990] describe variants of mark-and-sweep collectors applicable to the distributed environment. For these all nodes synchronise at the start of a local mark phase; At the end they perform a global rendezvous to exchange information about the global reachability. Each node then proceeds in parallel to a local sweep phase. A global rendezvous is inherently costly and nonscalable. Mohammed-Ali [1984] presented two different approaches, 'global' and 'local' collectors with minimal space overheads. In the global approach, mutation is globally suspended for the entire collection. The collector handles arbitrary pointer manipulations and resolve some of the space and communication problems of the marking-tree collector. Mohammed-Ali's [1984] 'local' collector simplifies collection by simply abandoning the attempt to recover cyclic garbage that spans several nodes. Each node asynchronously and independently performs local collection without involving any other node. If the freed storage is large enough the node's mutator will continue. Otherwise, it will invoke global collection. To allow a node to perform local garbage collection, it has to know which of its local cells are reachable from remote cells. Cells that have references from other nodes are assumed to be accessible in each local garbage collection. This situation persists until the next global collection invocation. In the collectors given by Mohammed-Ali, the issue of lost or transit messages is solved by first assuming that the communication channel between each pair of nodes is order-preserving. An alternative solution is to keep message counts in each node. Before a garbage collection is completed, a check is made to ensure that the number of reply messages equals the message count. The space overhead of the collectors are not easily determined. In addition to InTable and OutTable which keep track of incoming and outgoing references, there is TempTable that keeps in transit references and several message queues. Hughes' collector [Hughes, 1985] is based on Mohammed-Ali's 'local' collector but reclaims cyclic garbage. Its main idea is to pipeline a number of collections over the entire network. This is achieved with the use of a synchronous termination detection algorithm based on instantaneous communication. Synchronous termination, however, may invalidate the collector for architectures comprising many nodes. On the other hand, the approach may be unsuitable when local heaps are large since the contribution of one node must always consist of a complete scan of its local heap. In a special operating mode the creation of a remote reference has to be accompanied by an access to the referenced node [Rudalics, 1986]. A modification of the generation scavenging used for Berkeley Smalltalk [Ungar, 1984] was given by Schelvis and Bledoeg [1988] for a distributed Smalltalk collector. In addition to OS, NS, PS and FS which hold cells according to their age, there is additional subspace, RS, that contains all replicated cells . RS is like OS, except that it contains the same cells in the same order on every node. Newly created cells are stored in NS. When NS becomes full, it and PS are garbage collected by scavenging. The roots of the computation graph are the set of new and survivor cells referenced from OS, RS or remote nodes. This root set is dynamically updated by checking on stores of pointers to NS. All cells in the graph are moved to NS, except for sufficiently old cells, which are moved to OS. At the end of a traversal NS is empty. Since most new cells soon die, PS fills up relatively slowly and, therefore, collection of the much bigger OS and RS is necessary less frequently. Detection of dead cells in the distributed system is accomplished by a system wide mark-and-sweep collector. All nodes are checked if they have pointers to a particular cell. The graph of living cells is traversed, the cells accessed are marked, and at the end the space of unmarked cells is reclaimed or "swept". Although, the global mark-and-sweep collector handles both local and distributed cycles well, it does not work properly when not all nodes are able or willing to cooperate. 4.4 Hybrid collectors When local collectors are independent they need not be homogeneous. One node may employ reference counting, another concurrent mark-and-sweep. Global and local collection may employ different collectors. Bennett [1987] describes a scheme which uses both a reference counting collector and a mark- and-sweep collector in his prototype distributed Smalltalk-80 system. A single table in each node, the RemoteCellTable (RCT) holds local cells that are remotely referenced. Bennett relies on facilities provided by the local Smalltalk memory manager to enumerate local cells (proxy cells) that indirectly reference remote cells. There are two distributed garbage collectors in Bennett's scheme, a fast algorithm that does not reclaim internode cycles, and a slower one that does. The algorithms are initiated by a user on one of the nodes. The first reference counting collector relies on remotely referenced cells in alternating collection phases being distinguishable. Each cell has a flag in the RCT that identifies cells created since the start of a collection phase. These are similar to the grey cells of Dijkstra's [1978] collector. During each phase, each node enumerates its local proxies and sends a message for each proxy that increases the external reference count of the remote cell in its RCT entry. After this marking phase all remotely referenced cells have a nonzero external reference count. Each node then scans its RCT and removes those cells with a zero external reference count that were created before the start of the collection. Any such cells not referenced locally will be reclaimed by the node's local garbage collector. This algorithm does not detect and reclaim internode cycles. The second, slower collector is a distributed mark-and-sweep algorithm that proceeds from those cells in the RCT that also have local references. These cells are followed for references to proxies and messages are sent to the remote nodes of these proxies to continue the scan remotely. (Bennett's system is implemented on PS Smalltalk which employs deferred reference counting. The internal reference count of a cell is therefore readily available.) At the end of this phase internode cycles will not have been marked and can be removed from the RCT. DeTreville [1990] combines reference counting and mark-and-sweep in a concurrent collector for Modula-2+. The collector was used in a distributed workstation/server shared-memory multiprocessor environment. Each address space can have multiple threads of control which share a coherent view of the address space’s contents. Each address space has its own separate instance of the Modula-2+ collector. Communication between address spaces, on the same machine or across a network, is via Remote Procedure Call. Assignments to references that are potentially shared among threads (i.e., those that are not local variables) are logged on a transaction queue, which the collector reads asynchronously. The reference-counting collector reclaims most garbage; much less frequently the mark-and-sweep collector is used to reclaim cyclic structures. 4.5 Fault tolerant collection Local collection is a process which nodes are free to apply to their local store when necessary. During local collection no remote nodes are involved, only remote reference sets are accumulated. When local collection has terminated these reference sets are sent to appropriate nodes. It is not necessary that every remote node picks up this information immediately. It is even possible that it is not received at all, e.g. when the receiving node is currently down. The reference set of the next local collection will be sufficient. As a result some garbage may be kept alive longer than is necessary. In fact, as long as some node is down or inaccessible, garbage on the node will remain uncollected. The reference sets are guaranteed to include every remotely referenced cell, so no living cells will be collected as a result of incomplete information. Since the information exchange is the only interaction necessary between nodes and since there are no rules prescribing some time order or any other dependency between the local collection activities of different nodes, there are no synchronization problems. A collector for the distributed detection of garbage which offers a low-level distributed cell-support system was given by Shapiro et al [1990]. This collector focuses only on an OS-level realization of garbage detection. It is based on the realistic assumptions that messages may be lost, delivered out of order, or duplicated; nodes may crash; cells may migrate or be deleted. The protocol is fully parallel, and uses only information local to each site and information exchanged between pairs of sites; no global mechanism is necessary. The collectors' interface is designed for maximum independence from other components. Shapiro et al detail various message protocols. Given a reference, the finder protocol locates the cell referred to. This protocol also handles cell deletion and node crashes. Other protocols include reference-sending, cell-migration, cycle detection and abnormal termination protocols. To deal with lost messages or those in transit, events are timestamped by a local, monotonically increasing clock. Each transmitted message is stamped with the value of the clock on transmission. In Shapiro et al, [1990], the universe of cells is subdivided into disjoint spaces. Each space maintains the vector of highest timestamps received from other nodes. Each disjoint space maintains a list of potential incoming and outgoing references, called respectively the Cell Directory Table (CDT) and the External Reference Table (ERT). A CDT entry is stamped with the clock value of the last received message. When a mutator exports a reference to another node, it is first added to the local CDT. Both the CDT and the ERT are overestimates. Local garbage collection proceeds from both local roots and the CDT and will remove garbage entries in the ERT. In turn, this allows previously referenced CDTs to be collected. The interface between the global collector and other components (i.e. the mutator and the cell finder) is limited to just the CDT and ERT. Updates to a CDT or ERT can occur in parallel with other activities. No synchronization is needed between the global service and the local collector or mutator. The main weakness of the collector is that it fails to detect interspace cycles of garbage. It proposes migrating locally unreachable cells, leaving cycle removal to a local garbage collector. Total ordering of spaces is used to avoid thrashing but this has its limitations. Lang et al [1992] describe a fault-tolerant distributed collector that is largely independent of how nodes collect their local space and doesn't need centralized control nor global stop-the-world synchronization. It allows for multiple concurrent collections, doesn't require migration of cells (cf Shapiro et al) and yet reclaims all garbage cells including distributed cycles. In Lang et al [1992] nodes are organized into 'groups'. A group is a set of nodes willing to cooperate together in a group collection. Nodes cooperate to collect garbage local to a group by means of a concurrent mark-and-sweep collector. Each group gives a unique identifier to each GC cycle. Multiple overlapping group collections can be simultaneously active. When a node fails to cooperate, the group it belong is reorganized to exclude it and collection continues. The collector uses export and entry records as described in Section 4.1 but calls them exit and entry items respectively. Entry items have a reference count of exit items referencing them (up to messages in transit). Reclaiming an exit item requires a decrement message to be sent to the referenced entry item. If this action brings its counter to zero, the entry item is reclaimed. This mechanism for reclaiming entry items (the only one available) is safe since non cooperative nodes (or nodes that are down) do not send decrement messages and thus the cells they refer to cannot be reclaimed. Messages with acknowledgements and timeout are used to detect failed or non cooperating nodes. The distributed collection begins with group negotiation. Nodes cooperatively determine group formation. All entry items of nodes within the group are marked w.r.t. the group. An entry item is marked hard if it is "needed outside the group" or it is "accessible from a root of a node in the group". It is mark soft if it is only referenced from inside the group. The initial marks of the entry items of a group are determined locally to the group by means of a reference counter. The reference counter allows the determination of the number of references that are outside a group. The marks of entry items are then propagated towards exit items through local collection. Similarly, the marks of exit items are propagated towards entry items they reference (if it is within the group) through group collection. This is repeated until marks of entry or exit items of the group no longer evolve. At this point the group is disbanded. At the end of the marking, all entry items that are directly or indirectly accessible from a root or from a node outside the group are marked hard. Entry items marked soft can only be part of inaccessible cycles local to the group and can thus be safely reclaimed by the reference counting mechanisms. In the case of dead cycles, dead entry items in the cycle eventually receive decrement messages from all the dead exit items that reference them. Hence their reference counts decrease to zero and they are eventually reclaimed by the usual reference counting mechanism. Liskov and Ladin [1986], describe a fault tolerant distributed garbage detection based on their highly available centralized service. This service is logically centralized but physically replicated and so claims to achieve high availability and fault-tolerance. A client dialogues with a single replica; replicas stay upto-date by exchanging background "gossip" messages. The failure assumptions are realistic: nodes may crash (in a fail-stop manner) and recover, messages may be lost or delivered out of order. All cells and tables are assumed backed up in stable storage. Clocks are synchronized and message delivery delay is bounded. These requirements are needed for the centralized service to build a consistent view of the distributed system. Liskov and Ladin's [1986] distributed garbage collector relies on local markand-sweep, extended with the ability to identify the part of the graph between some incoming and outgoing reference. Each local collector informs the centralized service about the paths. The root used for tracing is the union of its local root with the set of local public cells. Local collectors query the centralized service about the real accessibility of their public cells to better estimate their root. Dead intersite cycles are detected by the centralized service. Based on the paths transmitted, the centralized service builds the graph of internode references and detects dead cycles with a standard collector. The problem of collection for reliable distributed systems was also addressed by Detlefs [1990a; 1990b; 1991]. Transactions in reliable, distributed systems are serializable and recoverable. An atomic collector must also preserve the consistency of data after hardware (and software) crashes. Thus, each transaction by the collector must be logged. After a crash, recovery can be redone by replaying the log of transactions or, if nonvolatile storage (disk) survives the crash, recovery may use this as the starting point if more efficient. Other work concerned with making garbage collection cooperate transparently with a transaction protocol was done by Kolodner [1989,1991]. 5.0 Summary An attempt has been made to give some structure to a review of distributed garbage collection. A problem has been that any conceptual scheme has so many exceptions. Collectors were broadly classified as those that identify garbage directly and those that identify it indirectly. Emphasis was given to collectors that appeared since the last major review of garbage collection [Cohen, 1981]. Table 1 gives a summary of characteristics of the distributed collectors described in the review. The collectors were evaluated in terms of the issues noted in Section 4.0 The following abbreviations are used in the table: Msg => Message Ack => Acknowledgement Cnt => Count M => Marking C => Copying RC => Reference Counting GS => Generation Scavenging Comm => Communication Synchro => Synchronization Where qualification is required, as in pause, space and communication overhead, a rank of low, medium and high is used. These are relative terms and an order or further explanation is, where available, given in brackets. A comprehensive bibliography on the subject follows. The number of references in the bibliography bear witness to the attention garbage collection is receiving, particularly distributed garbage collection. Despite this attention, a lot still remains to be done. About 80% of the distributed collectors reviewed in this paper have not been implemented. Acknowledgements We would like to thank Andrew Nimmo, Tim Kindberg and Xu Wang for reading the draft and offering useful suggestions. 6 Bibliography Almes G, Borning A and Messinger E (1983) Implementing a Smalltalk-80 system on the Intel 432: a feasibility study, in Smalltalk-80: Bits of History, Words of Advice, Addison-Wesley 175-187 Appleby K, Carlsson M, Haridi S, and Sahlin D (1983) Garbage collection for Prolog based on WAM, Comm. ACM 31, 719-741 Augusteijn L (1987) Garbage collection in a distributed environment, inPARLE'87 - Parallel Architectures and Languages Europe, LNCS 259, Springer-Verlag, 75- 93. Baden SB (1983) Low-overhead storage reclamation in the Smalltalk-80 virtual machine, in Smalltalk-80: Bits of History, Words of Advice, AddisonWesley, 331-342 Baker HG (1978) List Processing in real-time on a serial computer, Comm ACM 21, 280-294 Baker HG (1992) The treadmill: real-time garbage collection without motion sickness, SIGPLAN NOTICES 27(3), March 1992, 66-70 Bal H (1990) Programming Distributed Systems, Prentice Hall Ballard S and Shirron S (1983) The design and implementation of VAX/Smalltalk-80, in Smalltalk-80: Bits of History, Words of Advice, Addison-Wesley, 127-150 Bartlett JF (1990) A generational, compacting garbage collector for C++, ECOOP/OOPSLA‘90 Workshop on Garbage Collection. Bates RL, Dyer D and Koomen JAGM (1982) Implementation of Interlisp on the VAX, ACM Symposium on Lisp and Functional Programming, Pittsburgh, Pennsylvania, 15-18 August 1982, 81-87. Ben-Ari M (1984) Algorithms for on-the-fly garbage collection, ACM Transactions on Programming Languages and Systems 6, 333-44. Bennett JK (1987) The design and implementation of distributed Smalltalk, OOPSLA ‘87, SIGPLAN Notices 22(12), 318-330. Bengtsson M and Magnusson B (1990) Real-time compacting garbage collection, ECOOP/OOPSLA ‘90 Workshop on Garbage Collection. Benjamin G (1989) Generational reference counting: A reduced communication distributed storage reclamation scheme in Programming Languages Design and Implementation, SIGPLAN Notices 24, ACM Press, 313-321. Bevan DI (1987) Distributed garbage collection using reference counting, in PARLE ‘87 - Parallel Architectures and Languages Europe, LNCS 259, Springer-Verlag 176-187. Boehm HJ and Weiser M (1988) Garbage collection in an uncooperative environment, Software Practice and Experience 18(9), 807-820. Bobrow, DG (1980) Managing reentrant structures using reference counts, TOPLAS 2(3) 269-273. Brooks RA, Gabriel RP and Steele GL (1982) S-1 Common lisp implementation, ACM Symposium on Lisp and Functional Programming, Pittsburgh, Pennsylvania, 15-18 August 1982. 108-113. Brownbridge DR (1985) Cyclic reference counting for combinator machines, in Functional Programming Languages and Computer Architecture, LNCS 201, Springer-Verlag, 273-288. Carlsson S, Mattsson C and Bengtsson M (1990) A fast expected-time compacting garbage collection algorithm., ECOOP/OOPSLA ‘90 Workshop on Garbage Collection. Chambers C Ungar D and Lee E (1989) An efficient implementation of SELF, A dynamically-typed object-oriented language based on prototypes. OOPSLA '89, SIGPLAN Notices 24(10), ACM, 49-70. Cohen J and Trilling L (1967) Remarks on Garbage Collection using a two level storage (sic) BIT 7(1), 22-30 Cohen J (1981) Garbage collection of linked data structures, ACM Computing Surveys 13(3) 341-367. Collins GE (1960) A Method for overlapping and erasure of lists, Comm. of the ACM 3(12) 655-657. Courts R (1988) Improving locality of reference in a garbage-collecting memory management system, Comm. of the ACM 31(9) 1128-1138. Dawson JL (1982) Improved effectiveness from a real-time lisp garbage collector, 1982 ACM Symposium on Lisp and Functional Programming, Pittsburgh, Pennsylvania August 15-18. 159-167. Dellar CNR (1980) Removing backing store administration from the CAP operating system, Operating System Review 14(4) 41-49. Detlefs DL (1990a) Concurrent garbage collection for C++, CMU-CS-90-119 School of Computer Science, Carnegie Mellon Univ., Pittsburgh, PA 15213. Detlefs DL (1990b) Concurrent, atomic garbage collection., ECOOP/OOPSLA‘90 Workshop on Garbage Collection. Detlefs DL (1991) Concurrent, Atomic Garbage Collection, PhD Thesis,Dept of Computer Science, Carnegie Mellon Univ, Pittsburgh, PA 15213 CMUCS-90-177, November 1991. Demers A, Weiser M, Hayes B, Boehm H, Bobrow D and Shenker S (1990) Combining generational and conservative garbage collection: framework and implementations, in ACM Symposium on Principles of Programming Languages, 261 - 269. DeTreville J (1990) Experience with garbage collection for Modula-2+ in the Topaz Environment, ECOOP/OOPSLA‘90 Workshop on Garbage Collection. Deutsch LP and Bobrow DG (1976) An efficient, incremental, automatic garbage collector. Comm ACM 19(9) 522-526. Deutsch LP (1983) The Dorado Smalltalk-80 Implementation: Hardware architecture's impact on software architecture, in Smalltalk-80: Bits of History, Words of Advice, Addison-Wesley, 113-125. Dijkstra EW, Lamport L, Martin A J and Steffens EFM (1978) On-the-fly garbage collection: An exercise in cooperation, Comm ACM 21(11) 966975. Edelson D and Pohl I (1990) The case for garbage collector in C++, ECOOP/OOPSLA ‘90 Workshop on Garbage Collection. El-Habbash A, Horn C and Harris M (1990) Garbage collection in an object oriented, distributed, persistent environment., ECOOP/OOPSLA‘90 Workshop on Garbage Collection. Falcone JR and Stinger JR (1983) The Smalltalk-80 Implementation at HewlettPackard, in Smalltalk-80: Bits of History, Words of Advice, AddisonWesley, 79-112 Fenichel RR and Yochelson JC (1969) A LISP garbage-collector for virtualmemory computer systems, Comm ACM 12, 611-612 Ferreira P (1990) Storage reclamation., ECOOP/OOPSLA‘90 Workshop on Garbage Collection. Field AJ and Harrison PG (1988) Functional Programming, Addison-Wesley. Fisher DA (1974) Bounded workspace garbage collection in an address-order preserving list processing environment, Info. Processing Letters 3(1), 2932. Foderaro, JK, Fateman, RJ (1981) Characterization of VAX maxsyma in Proceedings of the 1981 ACM Symposium on Symbolic and Algebraic Computation, 14-19. Friedman DP and Wise DS (1976) Garbage collecting a heap which include a scatter table, Info. Processing Letters 5(6), 161-164. Friedman DP and Wise DS (1977) The one-bit reference count, BIT (17), 351359. Friedman DP and Wise DS (1979) Reference counting can manage the circular environments of mutual recursion, Info. Processing Letters 8(1), 41-45. Gabriel RP and Mansinter L M (1982) Performance of lisp systems, 1982 ACM Symposium on Lisp and Functional Programming, Pittsburgh, Pennsylvania, 15-18 August 1982, 123-142. Garnett NH and Needham RM (1980) An Asynchronous garbage collector for the Cambridge file server, Operating System Review 14(4 ), 36-40. Gelernter H, Hansen JR and Gerberrich CL (1960) A FORTRAN-compiled list processing language, JACM 7(2), 87-101. Goldberg A and Robson D (1983) Smalltalk-80: The Language and its Implementation, Addison-Wesley, 674-681 Hansen WJ (1969) Compact list representation: definition, garbage collection, and system implementation, Comm ACM 12(9), 499 Hayes, B (1990) Open systems require conservative garbage collectors, ECOOP/OOPSLA‘90 Workshop on Garbage Collection. Hayes B (1991) Using key object opportunism to collect old objects, OOPSLA'91, SIGPLAN Notices 26(11), ACM, 33-46 Hoare CAR (1974) Optimization of store size for garbage collection, Info. Processing Letters 2(6 ), 165-166. Hudak, P(1982) Object and Task Reclamation in Distributed Applicative Processing Systems, PhD Thesis, University of Utah. Hudak, P and Keller R M (1982) Garbage collection and task deletion in distributed applicative processing systems, ACM Symposium on Lisp and Functional Programming, Pittsburgh, Pennsylvania, August 1982, 168-178. Hudak P (1986) A semantic model of reference counting and its abstraction (detailed summary), Proceedings of 1986 ACM Conference on Lisp and Functional Programming, Massachusetts Institute of Technology, 351363. Hudson, R and Diwan A(1990) Adaptive garbage collection for Modula-3 and Smalltalk., ECOOP/OOPSLA ‘90 Workshop on Garbage Collection. Hughes, J (1984) Reference counting with circular structures in virtual memory, spplicative dystems, TR Programming Research Group, Oxford Univ. Hughes, J (1985) A distributed garbage collection algorithm, in Functional Programming Languages and Computer Architecture, LNCS 2 0 1 , Springer-Verlag, 256 - 272. Johnson D (1991) The case for a real barrier, Proceedings of the Fourth International Support for Programming Languages and Operating Systems (ASPLOS IV), 96-107. Jones SLP (1987) The Implementation of Functional Programming Languages, Prentice-Hall. Jonkers HBM (1979) A gast garbage compaction algorithm. Info. Processing Letters 9(1) 26-30. Juul NC (1990) Report on the ECOOP/OOPSLA‘90 Workshop on Garbage Collection in Object-Oriented Systems. Kafara D, Washabaugh D and Nelson J (1990) Garbage collection of actors, ECOOP/OOPSLA ‘90 Proceedings of Workshop on Garbage Collection, 126-34 Kain RY (1969) Block structures, indirect addressing and garbage collection. Comm ACM 12(7) 395-398. Knuth DE (1973) The Art of Computer Programming; Vol 1: Fundamental Algorithms, Addision-Wesley, Reading, Mass. Kolodner E, Liskov B and Weihl W (1989) Atomic garbage collection: managing a stable heap, Proceedings of 1989 ACM SIGMOD International Conference on the Management of Data, 15-25. Kolodner E (1991) Atomic incremental garbage collection and recovery for Large stable heap, implementing persistent object bases: principles and practice, Fourth International Workshop on Persistent Object Systems, Morgan- Kaufmann Publishers, San Mafeo, California. Krasner G (ed) (1983) Smalltalk-80: Bits of History, Words of Advice, AddisonWesley. Lang, B, Queinnec C, and Piquer J (1992) Garbage collecting the world, Proceedings of the 19th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'92), 1992. Lermen C W and Maurer D (1986) A Protocol for Distributed Reference Counting, Proceedings of 1986 ACM Conference on Lisp and Functional Programming, Massachusetts Institute of Technology, 343-350. Li K (1988a) Real-time concurrent collection in user mode, ECOOP/OOPSLA‘90 Workshop on Garbage Collection. Li K Appel AW, and Ellis JR (1988b) Real-time concurrent collection on stock multiprocessors, ACM SIGPLAN ‘88 Conference on Programming Language Design and Implementation, 11-20. Lieberman H and Hewitt C (1983) A real-time garbage collector based on the lifetimes of objects. Comm ACM 26(6), 419-429. Lindstrom G (1974) Copying list structures using bounded workspace, Comm ACM 17(4 ), 198-202. Liskov B and Ladin R (1986) Highly-available distributed services and faulttolerant distributed garbage collection, in Proceedings of the 5th symposium on the Principles of Distributed Computing, ACM, 29-39 Martin JJ (1982) An efficient garbage compaction algorithm, Comm ACM 25(8), 571-581. McCarthy J (1960) Recursive functions of symbolic expressions and their computation by machine: Part I, Comm ACM 3(4), 184-195. McCullough PL (1983) Implementing the Smalltalk-80 system: The Tektronix experience, in Smalltalk-80: Bits of History, Words of Advice, AddisonWesley, 59-78 Meyers R and Casseres D(1983) An MC68000-Based Smalltalk-80 System, in Smalltalk-80: Bits of History, Words of Advice, Addison-Wesley, 175-187. Miranda E (1987) BrouHaHa - a portable Smalltalk interpreter, OOPSLA'87, SIGPLAN Notices 22(12), ACM, 354-365 Mohammed-Ali K A (1984) Object-Oriented Storage Management and Garbage Collection in Distributed Processing Systems, Academic Dissertation, Royal Institute of Technology, Dept of Computer Systems, Stockholm, Sweden. Moon D (1984) Garbage collection in a large lisp system, 1984 ACM Symposium on Lisp and Functional Programming, 235-246. Morris FL (1978) A time- and Space-efficient garbage compaction algorithm. Comm ACM 21(8), 662-665. Morris FL (1979) On a comparison of garbage collection techniques, Comm ACM 22(10), 571. Moss JEB (1990) Garbage collecting persistent object stores, ECOOP/OOPSLA‘90 Workshop on Garbage Collection. Newell A and Tonge FM (1960) An introduction to IPL-V, Comm ACM 3, 205 211. Newman IA, Stallard RP and Woodward MC (1982) Performance of parallel garbage collection algorithms, Computer Studies 166, Sept 1982. Nilsen K and Schmidt WJ (1990) Hardware support for garbage collection of linked objects and arrays in real time, ECOOP/OOPSLA‘90 Workshop on Garbage Collection. October 1990. Nilsen K (1988) Garbage collection of strings and linked data structures in real time, Software Practice and Experience 18(7), July 1988, 613 - 640. North SC and Reppy JH (1987) Concurrent garbage collection on stock hardware in Functional Programming Languages and Computer Architecture, LNCS 274 Springer-Verlag, 1987, 113 - 133. ParcPlace (1991) Objectworks\Smalltalk Release 4 User's Guide, Memory Management 229-237 Queinnec C, Beaudoing B, and Queille J (1989) Mark DURING sweep rather than mark THEN sweep, in PARLE '89 - Parallel Architectures and Languages Europe. LNCS 365, Springer-Verlag. Rudalics M, (1986) Distributed copying garbage collection, Proceedings of 1986 ACM Conference on Lisp and Functional Programming, Massachusetts Institute of Technology, 364-372. Schelvis M and Bledoeg E (1988) The implementation of a distributed Smalltalk, ECOOP Proceedings, August 1988 LNCS 322. Schelvis M (1989) Incremental distribution of timestamp packets: a new approach to distributed garbage collection, OOPSLA'89, SIGPLAN Notices 24(10), ACM, 37-48. Schorr H and Waite WM (1967) An efficient machine-independent procedure for garbage collection in various list structures, Comm ACM 10(8 ), 501506. Sharma R and Soffa M L (1991) Parallel generational garbage collection, OOPSLA 91, SIGPLAN Notices 26(11), ACM, 16-32 Shapiro M, Plainfosse D and Gruber O (1990) A garbage detection protocol for a realistic distributed object-support system., ECOOP/OOPSLA‘90 Workshop on Garbage Collection. October 1990. Standish TA (1980) Data Structures Techniques, Addison-Wesley, Reading, Mass., 1980. Steele GL (1975) Multiprocessing compactifying garbage collection. Comm ACM 18(9), 495-508. Thomas RE, (1981) A dataflow computer with improved asymptotic performance, MIT Laboratory for computer science report MIT/LCS/TR265 Terashima M and Goto E (1978) Genetic order and compactifying garbage collector, Info. Processing Letters 7(1), 27-32. Ungar DM and Patterson DA (1983) Berkeley Smalltalk: who knows where the time goes?, in Smalltalk-80: Bits of History, Words of advice, AddisonWesley 189-206. Ungar D (1984) Generation scavenging: a non-disruptive high performance storage reclamation algorithm, in Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, April 1984, 157-167. Ungar D and Jackson F (1988) Tenuring policies for generation-based storage reclamation, OOPSLA'88, SIGPLAN Notices 23(11), ACM, 1-17. Ungar D and Jackson F (1992) An adaptive tenuring policy for generation scavengers, TOPLAS 14(1), 1-27. Vestal S C (1987) Garbage Collection: An Exercise in Distributed, FaultTolerant Programming. PhD Thesis, Dept. of Computer Science, University of Washington, Seattle WA (USA), January 1987. Wadler PL (1976) Analysis of an algorithm for real-time garbage collection, Comm ACM 19(9), 491-500. Watson I (1986) An analysis of garbage collection for distributed Systems, TR Dept Comp. Sc., U. Manchester. Watson P and Watson I (1987) An efficient garbage collection scheme for parallel computer architecture, in PARLE ‘87 - Parallel Architectures and Languages Europe, LNCS 259, 432 - 443. Wegbreit B (1972) A generalised compacting garbage collector, Computer Journal 15(3) 204-208. Weizenbaum J (1962) Knotted list structures, Comm ACM 5(3), 161 -165. Weizenbaum J (1963) Symmetric list processor, Comm ACM 6(9), 524 -544. White JL (1980) Address/memory management for a gigantic Lisp environment, or GC considered harmful, Conference Record of the 1980 Lisp Conference, 119-127. Wilson PR and Moher TG (1989) Design of the opportunistic garbage collector, OOPSLA'89, SIGPLAN Notices 24(10), ACM, 23-35. Wilson PR (1990) Some issues and strategies in heap management and memory Hierarchies, ECOOP/OOPSLA‘90 Workshop on Garbage Collection Wilson PR, Lam MS and Moher TG (1990) Caching considerations for generational garbage collection: a case for large and set-associative caches., TR UIC-EECS-90-5, December 1990. Wilson P R (1992) Comp.compiler Usenet discussion, February 1992 Wilson PR, Lam MS and Moher TG (1991) Effective “static-graph” reorganization to improve locality in garbage-collected systems, Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation. Toronto, Ontario, Canada, 177191. Wolczko M and Williams I (1990) Garbage collection in high-performance system, ECOOP/OOPSLA‘90 Workshop on Garbage Collection. October 1990. Woodward MC (1981) Multiprocessor garbage collection - a new solution, Computer Studies 115 Zave DA (1975) A fast compacting garbage collector, Info Processing Letters 3, 167-169. Zorn B (1989) Comparative performance evaluation of garbage collection Algorithms, PhD Thesis, EECS Dept, UC Berkeley. Zorn, B (1990) Designing systems for evaluation: A case study of garbage collection., ECOOP/OOPSLA‘90 Workshop on Garbage Collection.