Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Distributed Watchpoints: Debugging Large Multi-Robot Systems

2007

Distributed Watchpoints: Debugging Large Multi-Robot Systems Michael De Rosa Student Member, IEEE, Jason Campbell Senior Member, IEEE Padmanabhan Pillai Member, IEEE, Seth Goldstein Senior Member, IEEE Peter Lee, and Todd Mowry Member, IEEE Abstract— Tightly-coupled multi-agent systems such as modular robots frequently exhibit properties of interest that span multiple modules. These properties cannot easily be detected from any single module, though they might readily be detected by combining the knowledge of multiple modules. Testing for distributed conditions is especially important in debugging or verifying the correctness of software for modular robots. We have developed a technique we call distributed watchpoint triggers which can efficiently recognize such distributed conditions. Our watchpoint description language can handle a variety of temporal, spatial, and logical properties spanning multiple robots. This paper presents that language, describes our fullydistributed, online mechanism for detecting distributed conditions in a running system, and evaluates the performance of our implementation. We found that the performance of the system is highly dependent on the program being debugged, scales linearly with ensemble size, and is small enough to make the system practical in all but the worst case scenarios. I. I NTRODUCTION Designing algorithms for distributed systems is a difficult and error-prone process. Concurrency, non-deterministic timing, and combinatorial explosion of possible states all contribute to the likelihood of bugs in even the most meticulously designed software. Likewise, these factors also make detection of bugs very difficult. Several modular robotics systems, such as Claytronics [1], envision very large distributed systems consisting of millions of modules, further exacerbating this problem. Tools to assist programmers in debugging distributed algorithms are few, and generally inadequate. Most are forced to fall back on standard debugging methods, such as GDB [2] or logging through printf. GDB is useful for debugging errors local to an individual thread or process, but is not effective for errors resulting from the interactions or states of multiple threads of execution that span multiple modules. While printf may be used to detect some of these errors, this requires logging all potentially relevant state information at each robot, then centrally collecting, correlating, and post-processing the data to extract the details of the error condition. This requires significant effort, skill, and De Rosa is with the School of Computer Science, Carnegie Mellon University, mderosa@cs.cmu.edu Goldstein and Lee are with the School of Computer Science, Carnegie Mellon University, [seth,petel]@cs.cmu.edu Campbell and Pillai are with Intel Research Pittsburgh, jason.campbell@intel.com, padmanabhan.s.pillai @intel.com Mowry is with tcm@cs.cmu.edu both CMU and Intel Research Pittsburgh, often luck on the part of the programmer. Additionally, both GDB and printf must be used very cautiously, or their file/console I/O can impose unintended serialization, altering the timing behavior of the robot ensemble, and possibly masking some bug manifestations. A. Related Work In considering the design of a distributed debugging system for modular robots, there are three relevant areas of existing research to consider. The first of these is work on distributed and parallel debugging, including the Chandy-Lamport global snapshot algorithm [4], and subsequent related work on global predicate evaluation [5]–[7]. Snapshotting and global predicate evaluation are valuable tools, but they are geared towards the problem of finding a single instance of a particular global configuration, where the conditions that manifest in modular robots are numerous and localized. Additionally, global snapshots require the aggregation of all relevant data at a central point, resulting in a large communications overhead [8]. Other important parallel debugging tools include static code analysis tools such as race detectors [9], [10], which can detect many (but not all) data races. While race detectors are important tools, they are not general debugging aids. Another relevant research area is the development of logicbased verification/proof tools. Specifically, linear temporal logic (LTL) [11], a modal temporal logic, is capable of representing and reasoning on infinite state sequences, such as those that might be generated by FSM-style robot programs. This capability of LTL was exploited by Lamine et al [12], who developed an LTL-based model verification tool for single mobile robots. Finally, declarative overlay network systems, such as P2 [13], provide a general purpose tool for the computation of distributed flow functions, which could include debugging primitives. In fact, P2 includes debugging support which leverages the system’s ability to compute arbitrary distributed functions [14]. However, the focus of the P2 project is not on robotics, and as such it does not explicitly deal with the rapidly changing topologies inherent in modular robotics. Additionally, the use of P2 for debugging implies the adoption of the P2 programming paradigm, which may not be appropriate for all applications. hwatchpointi hmodule decl.i hbooli hmodulei hcomparei hstate vari hopi hr vali Fig. 1. −→ hmodule decl.i hbooli + −→ modules( hstringi ) −→ not hbooli | hbooli and hbooli | hbooli or hbooli | neighbor( hmodulei hmodulei ) | ( hcomparei ) | ( hbooli ) −→ hstringi | hmodulei .last | hmodulei .next −→ hstate vari hopi hr vali −→ hmodulei . hstringi −→ < | > | = | != | >= | <= −→ hstate vari | hnumeric constanti Extended BNF grammar for watchpoint description language B. Distributed Watchpoints: Our Approach The key to enabling effective distributed debugging is to allow programmers to easily specify and detect distributed conditions in a multi-robot ensemble. Such conditions constitute logical relations between state variables which are distributed both temporally and spatially across the ensemble. Generally, they cannot be detected by observing the state of any single robot, or even the whole system at any single point in time. For example, in debugging a distributed motion planning algorithm, we may wish to detect if two adjacent modules each initiate motion within four iterations of the algorithm main loop. A tool which can detect such conditions can provide insights into the logical and temporal behavior of the system, and help pinpoint defects in distributed algorithms. To this end, we introduced the concept of a distributed watchpoint in [3]. The present paper extends that work by formalizing the watchpoint specification language (see Section II), describing a fully distributed implementation of the watchpoint mechanism (see Section III), providing some simple performance-enhancing optimizations (see Section IV), and evaluating the performance of our approach (see Section V). II. D ESCRIBING E RRORS The first step in detecting distributed error conditions is to represent them effectively. To that end, we have created a simple watchpoint description language, based on a fragment of LTL [11] with the addition of predicates for state variable comparison and topological restriction (see Figure 1). In representing distributed error conditions, we make a key assumption: the error must be able to be represented by a fixed-size, connected sub-ensemble of robots in specific L L L L L L L Watchpoint Text: modules(a b c);(a.isLeader = 1) and (c.isLeader = 1) Fig. 2. Incorrect 2-hop leader election. Conflicting leaders are circled. The watchpoint text shows the error condition where two leaders exist in the same two-hop radius. states. Allowing disconnected sub-ensembles would imply an exponential search through all subsets of the total ensemble, and distributing information between the members of these subsets would require significant multi-hop messaging. Watchpoint descriptions begin with a list of module names. This list defines the size of the matching sub-ensemble, and is implicitly quantified over all connected subgroups of this size in the ensemble. The language includes the standard boolean and grouping primitives plus topological restrictions and state variable comparisons. Topological restrictions take the form neighbor(a b) and indicate that the two specified modules are neighbors. State variable comparisons allow for the comparison of named state variables in one module against constants, other local variables, or remote variables on other robots. Additionally, state variable comparisons may include arbitrary uses of the last and next temporal modal operators, which provide access to the past and future states of the robot’s state variables. In the case of the next operator, this implies that the watchpoint triggers in the “future”, and that the state of the robots would need to be rolled back one or more timesteps when the watchpoint triggers. These simple primitives give us the ability to represent very complex distributed conditions. We can reason along three different axes of configuration: numeric state variables, topological configuration, and temporal progression. Topological restrictions allow us to model (in some abstract fashion) the configuration space of the robots, so that error states related to the physical positioning of neighboring modules may be represented. Temporal modal operators can be used to represent sequences of states, a useful capability for debugging distributed finite state automata. We illustrate the utility of the watchpoint description language with two debugging examples: incorrect leader election and token passing. As shown in Figure 2, we have a hexagonally-packed array of robots which are attempting to select leaders using some (unspecified) leader election protocol. Each leader must have a path distance of at least two hops to any other leader. It is obvious from inspection of slots a T T x T b expression tree and = e tim 1 x.tok or and Watchpoint Text: modules(a x b); neighbor(a x) and neighbor(x b) and (x.tok = 1) and (((last.a.tok = 1) and (last.b.tok = 1)) or ((last.a.tok = 0) and (last.b.tok = 0))) = a.last.tok = 1 b.last.tok and 1 = Fig. 3. Part of a token-passing ring. Previous states shown stacked behind current ones. The watchpoint text corresponds to error conditions where zero or two modules previously had the token. Figure 2 that the algorithm has yielded incorrect results, as there are two leaders within two hops of each other. While this is readily discernible from an omniscient perspective, any single robot will not be able to detect this error condition without communicating with its neighbors. To represent this error state, we use the watchpoint in Figure 2. Deconstructing the watchpoint expression, we see it specifies three modules, with a linear path from a to c, and where both a and c are leaders. A match for this watchpoint indicates a violation of the path-distance criteria given above. As a slightly more complex example, let us consider the problem of token passing in a ring network (see Figure 3). We would like to enforce the condition that, if robot x has the token, then exactly one of its neighbors must have had the token in the last timestep. We can express the violation of this condition with the watchpoint shown in Figure 3. Here the watchpoint again specifies three modules, with module x currently holding the token. An error occurs if both or neither of x’s neighbors previously had the token. Note that we do not need to use topological restriction in this watchpoint, as the requirement that x,a,b form a connected sub-ensemble is sufficient. III. D ETECTING E RRORS We consider a simplified machine model for each modular robot: each robot is represented by a number of named integer state variables, and an array of neighbors. We assume that each robot iterates through three atomic phases: computation, state variable assignment, and communication. Computation may take an arbitrary amount of time, and each robot can communicate only with its immediate neighbors. Furthermore, we assume that each robot has a copy of the watchpoint, and that each robot has the relevant local state variables needed by the watchpoint. We explicitly do not require that all robots have the same code image, merely that they have compatible state variables. This simplified model does not entail a large loss of generality, as we can express most run-loop, finite-automata, and event-driven programs within it. The main component in our distributed watchpoint implementation is the PatternMatcher object (Figure 4). A a.last.tok = 0 b.last.tok 0 Fig. 4. PatternMatcher object for the watchpoint example in Figure 3. Empty variables shown with dotted outlines. At each step, the first empty slot is filled with the local node ID, and corresponding variables filled with local data. Copies of a partially filled PatternMatcher propagate to neighboring nodes until the expression tree can be definitively evaluated. Algorithm 1 Centralized Watchpoint Update S=∅ for all modules m do create new PatternMatcher p from the watchpoint text fill p’s first slot with m S =S∪p end for while S 6= ∅ do T =∅ for all PatternMatchers p ∈ S do if p matches then execute trigger action for watchpoint else if p is indeterminate then for all neighbors n of modules in p’s slots do p1 = clone(p) fill p1 ’s first open slot with n T = T ∪ p1 end for end if S =S−p end for S=T end while PatternMatcher consists of two subunits: a set of named slots that hold robot ID numbers, and an expression tree that both represents the watchpoint expression and stores any accumulated state variables. A PatternMatcher may be empty (with none of its slots filled), partially filled (with some slots and state variables filled), or completely filled. A given PatternMatcher may be in one of three states: matched, failed, or indeterminate. The indeterminate state occurs when there is insufficient information in a partially filled PatternMatcher to decide whether its expression is satisfied. Once a PatternMatcher has matched, the error condition Algorithm 2 Distributed Watchpoint Update create new PatternMatcher p from the watchpoint text fill p’s first slot with local module m S =S∪p for all PatternMatchers p ∈ S do if p matches then execute trigger action for watchpoint else if p is indeterminate then for all neighbors n of m that are not already in p do p1 = clone(p) send p1 to n via messaging system end for end if S =S−p end for has been detected at the final robot added to the subensemble, and an arbitrary action can be executed. This can be as simple as halting the robots, or as complicated as initiating some expensive logging or recovery operation. A. Centralized Implementation Our initial implementation, introduced in [3], relied on a single centralized procedure to update all PatternMatchers across an entire (simulated) ensemble. The watchpoint system maintains a set of vectors for each robot’s state variables. At each timestep, the current values of all state variables used by active watchpoints are appended to the vectors, providing state history for the variables. The simulator also maintains a single set of PatternMatchers (S), which are updated and processed every timestep as described in Algorithm 1. B. Distributed Implementation For our distributed implementation of watchpoint functionality, rather than having one central state vector and PatternMatcher array, each robot maintains its own state history and set of active PatternMatchers. Robots then independently (and asynchronously) execute two behaviors: 1) When an incoming message is received containing a PatternMatcher, the robot fills the PatternMatcher’s next open slot with its information, and adds it to a local set S of active PatternMatchers. 2) Each timestep, every robot m updates its local state information and then runs Algorithm 2, to process any active PatternMatchers. We note that this algorithm limits the topologies of triggering sub-ensembles to linear chains that match the watchpoint’s variables in order. This is intentional, as it removes the need for multi-hop communication in trigger ensembles with non-linear or non-ordered topologies (Figure 5). We are currently working to remove this limitation, at the cost of increased latency. It is interesting to note that, as we store relevant state information in the expression tree of the PatternMatcher while it migrates between robots, the above algorithm is equivalent to a Chandy-Lamport snapshot [4] of bounded D a) A C B B D b) A C Fig. 5. Non-linear (a) and non-ordered (b) sub-ensembles for the set of modules (a b c d e). radius, as the message carrying the PatternMatcher serves as both a snapshot beacon and data aggregator. The lack of synchronization between modules implies that we can obtain only consistent sets of states, not simultaneous sets of states, as simultaneity is ill-defined in asynchronous distributed systems. IV. O PTIMIZATIONS To reduce the storage, processing, and communications demands of our watchpoint system, we implement three optimizations: temporal span detection, early termination of candidate pattern matchers, and aggressive neighbor culling. A. Temporal Span Detection For each state variable, we must determine the minimum amount of history that must be maintained by each robot. We call this quantity the temporal span of the variable. Additionally, we must determine the minimum amount of total state (all state variables plus neighbor information) that must be maintained to allow for watchpoints that trigger in the future. This is the temporal extent of the system. To calculate the temporal span of a variable, we inspect each use of that variable in the watchpoint expression. For each use, we calculate the temporal extent by assigning a value of +1 to each next occurrence, and a value of −1 to each last. The sum is then the temporal extent for that particular use of the variable. The temporal span for the variable is the maximal difference between any two temporal extents. This is the amount of history that must be maintained for that variable. Similarly, the maximum positive extent over all variables specifies the size of the total state vector that must be maintained, is it bounds how far the watchpoint must “rewind” for expressions that use the next operator. B. Early Termination To reduce the number of active PatternMatchers, and thus the bandwidth and processing cost of the algorithm, we aggressively cull PatternMatchers that have no chance of modules(a b c d);(a.x1 = 0) and (b.x2 = 0) and (c.x3 = 0) and (d.x4 = 0) Fig. 6. Performance Evaluation Watchpoint TABLE I S UCCESSFUL M ATCHES VS . E XECUTION T IME FOR THE 1000 NODE ENSEMBLE DESCRIBED IN S ECTION V. Program tuple # matches time(secs) [1:1:1:1] 6233141 715 [2:1:1:1] 3119538 358 [2:2:2:2] 394103 128 [1:2:4:8] 96507 183 [8:4:2:1] 96390 32 PatternMatchers (segmented by number of slots filled), and number of successful matches. Fig. 7. Total number of Pattern Matchers generated versus number of slots filled after 100 timesteps of execution for the 1000 node ensemble described in Section V. Note that certain host program behavior triggers an exponential increase in the number of PatternMatchers. succeeding. Whenever we generate new children, we first check whether the parent’s expression tree can never match. This can happen even if the PatternMatcher is not completely filled, as subclauses of the expression tree may have become unsatisfiable. If this condition is detected, the parent is deleted, and no children are created. C. Neighbor Culling Finally, we reduce the set of neighbors to which a given PatternMatcher can spread by examining the topological constraints of the expression tree. If the constraint neighbor(a b) exists in the watchpoint, b is the next open slot, and a is already filled, then the PatternMatcher can only spread to neighbors of a. In the case of multiple topological restrictions, we generate a set of possible neighbors by traversing the tree from the bottom up, treating and as set intersection and or as set union. To facilitate this, we add a set of 1-hop neighbors for each slot in the PatternMatcher, allowing for fast local computation of set operations. V. E VALUATION We evaluated the algorithms using DPRSim [15], our massively multithreaded multi-robot simulator. Evaluation was performed on dual-core machines under Linux. To more accurately measure the differential overhead of watchpoint support, we disabled the physics and graphic rendering portions of the simulator. All tests were performed on ensembles of 100 to 1000 modules arranged in a cubic lattice packing in 10 by 10 stacked planes. Simulations were conducted for 100 virtual timesteps, where each timestep allowed for arbitrary computation, including message transmission/reception. Message travel time was 1 timestep. Test configurations were monitored for total execution time, number of active A. Host Program Behavior When evaluating the performance of the algorithms, we were immediately struck by how dependent runtimes were on the behavior of the host program being debugged. To illustrate this, we used the watchpoint shown in Figure 6. x1 through x4 are four independent uniformly-distributed integer random variables generated by the host program. Each variable xn ranges over the integral values from 0 to maxxn − 1 but matches the test watchpoint only when the value is precisely 0. We can thus represent the behavior of the host program with the tuple [maxx1 : maxx2 : maxx3 : maxx4 ]. For example, the tuple [2:2:2:2] represents a host program that causes half of all PatternMatchers to be discarded after each slot is filled. In contrast, the tuple [4:1:1:1] describes the case where 34 of all PatternMatchers are discarded after filling the first slot, but then all remaining PatternMatchers survive until they are fully filled, at which point they match. With this test case we can now examine how variation in host program behavior impacts the number of active PatternMatchers (and thus the execution time). We begin with the “worst case” tuple, [1:1:1:1], where all generated PatternMatchers will always survive, leading to an exponential explosion of PatternMatchers as seen at the top of Figure 7. After 100 timesteps on 1000 robots, over 6 million successful PatternMatchers have accumulated. We can easily halve the number of active PatternMatchers by using the tuple [2:1:1:1], which halves the number of PatternMatchers that survive having the first slot filled. Halving the number of active PatternMatchers at each step (as in [2:2:2:2]) results in the expected decrease by a factor of 16. Finally, comparing [1:2:4:8] and [8:4:2:1] is quite instructive. Both eventually generate an almost identical number of successful matches, but over wildly different trajectories. [1:2:4:8], which culls most of its PatternMatchers after the last slot has been filled, is much less efficient than [8:4:2:1]. This can be seen at the bottom of Table I, where one takes almost 6 times as long as the other. Continuing to examine Table I, we note a general TABLE II WATCHPOINT OVERHEAD A S E XPRESSION S IZE VARIES FOR AN ENSEMBLE OF Expression Size Fig. 8. A medium-intensity workload involving unrelated distributed algorithms. It runs simultaneously with the watchpoint system to provide a baseline for the overhead of both the centralize and distributed detection algorithms. linear increase in the amount of time taken by the distributed algorithm as the number of successful matches increases. The host program behavior dependency that we have illustrated above is due to a simple fact: there is an exponential (in the number of slots) number of PatternMatchers generated by the spreading of each PatternMatcher to all of a robot’s neighbors after each slot is filled. And by shifting the criteria that are least frequently true to early in the watchpoint evaluation, we can dramatically cut execution time, even though the total number of successful matches remains the same. Low (maxxi = 100) 1000 ROBOTS Frequency Medium (maxxi = 8) High (maxxi = 2) 1 7.6% 8.6% 3.6% 2 25.2% 32.4% 54.4% 3 39.2% 33.0% 104.7% 4 42.6% 52.0% 369.8% 5 23.0% 83.4% 1432.5% C. Varying Watchpoint Sizes Finally, we analyzed the overhead of the watchpoint system as the size of the expression grew. Using a similar expression to that in Figure 6, we varied the number of modules in the expression from 1 to 5. We executed the watchpoints on the centralized algorithm, using a cube of 1000 modules over 100 timesteps. For each expression size, we evaluated the overhead using low- (maxxi = 100) medium- (maxxi = 8) and high-frequency (maxxi = 2) program behaviors. We note that the overhead for the lowfrequency behavior is dominated by the random chance that a matching sub-ensemble exists, and thus varies only a little as expression size increases. As shown in Table V-C, mediumand high-frequency behaviors show a noticeable increase in overhead as expression size increases, which corresponds to the exponential increase in the number of PatternMatchers that one would expect as the expression size increases. D. Hardware Requirements B. Execution Overhead We also analyzed the overall execution time of the algorithms, and their scaling behavior as the size of the robot ensemble grows (Figure 8). In these tests, we used the same watchpoint expression as above, with the host program generating random variables according to the [2:2:2:2] scheme. Each robot also ran a data aggregation and landmark routing program, to simulate a medium-intensity workload on the system. Tests were run on ensembles of various sizes, using the centralized and distributed implementations, as well as without any watchpoints enabled (for comparison). We note that all three datasets scale linearly in time as the ensemble size increases. The centralized algorithm required a mean overhead of 118%, while the overhead of the distributed version was only 105%. The distributed implementation must do at least as much work as the centralized algorithm, plus the cost of messaging, so the speedup in the distributed case was quite counterintuitive, until we realized that the distributed implementation was naturally taking advantage of the potential for parallel execution on the dual-core test system. The overhead for both algorithms is well within the range of other debugging tools like GDB [2] and Valgrind [16]. The resources required to implement this technique in real, rather than simulated, modular robots are modest. Memory needs per module would typically be tens of kilobytes or less (including storage for pattern matchers plus local state memory). Likewise code size could be modest: the full implementation of our distributed implementation is less than 1500 lines of C++. In many cases the required communications could be piggybacked onto other pre-existing messages between modules, and since exchanges are limited to nearest neighbors many designs would be able to take advantage of neighbor to neighbor wired or infrared links for such data. The most constrained resource would probably be processor cycles for systems already operating close to their computational or communications limits. In those cases, the additional load of transmitting and processing pattern matchers could require the system to break potential realtime constraints. It is increasingly feasible , however to provision all but the smallest robot modules with powerful processors. VI. C ONCLUSIONS We have demonstrated two significant contributions: the ability to express a large class of distributed error conditions and two algorithms to detect these conditions both in simulation and in real robotic ensembles. Our watchpoint description language allows for the expression of complex distributed conditions along three different axes of configuration: numeric state variables, topological configuration, and temporal progression. We describe two algorithms, a centralized algorithm and a distributed algorithm, which evaluate the watchpoints over all connected sub-ensembles in the system. Both of the presented algorithms have execution overheads low enough to make them practical. The main component of the overheads is directly related to the number of PatternMatcher objects that are generated and propagated through our system. Thus, we found that the overhead of the system depends heavily on the host program being monitored and the structure of the watchpoint. The sooner it can be shown that a particular PatternMatcher object cannot trigger the watchpoint, the fewer PatternMatcher objects are spawned. In the worst case, an exponential number of PatternMatchers will be spawned which can lead to a significant slowdown (on our most extreme example a slowdown of about fourteen times). Note, however, that the user can control the amount of overhead introduced by structuring the watchpoint to fail early. We have found that reasonable watchpoints introduce overhead of 100% or less. This is on par with overheads from such powerful (and heavily used) tools as Valgrind [16] and a small price to pay for the power of finding bugs which involve multiple robots in the ensemble. ACKNOWLEDGMENTS This research was sponsored by the National Science Foundation (NSF) under grant number CNS 0428738 (ITR: Synthetic Reality). The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity. The authors wish to thank Casey Helfrich and Michael Ryan for developing the simulator, Benjamin Rister for providing a sample workload, Deepak Garg and Frank Pfenning for a tutorial on LTL, and David O’Halloran and TianKai Tu for an insightful discussion on current practices in parallel debugging. R EFERENCES [1] S.Goldstein, J. Campbell, and T. Mowry, “Programmable matter,” IEEE Computer, vol. 38, 6, pp. 99–101, May 2005. [2] GDB: The GNU Project Debugger. [Online]. Available: http://www.gnu.org/software/gdb/ [3] M. DeRosa, S. Goldstein, P.Lee, J. Campbell, and P. Pillai, “Distributed watchpoints: Debugging very large ensembles of robots (extended abstract),” in RSS’06 Workshop on Self-reconfigurable Modular Robotics, August 2006. [4] K. M. Chandy and L. Lamport, “Distributed snapshots: Determining global states in distributed systems,” ACM Transactions on Computer Systems, vol. 3, no. 1, pp. 63–75, February 1985. [5] C. M. Chase and V. K. Garg, “Detection of global predicates: Techniques and their limitations,” Distributed Computing, vol. 11, no. 4, pp. 191–201, 1998. [6] E. Fromentin, M. Raynal, V. K. Garg, and A. I. Tomlinson, “On the fly testing of regular patterns in distributed computations,” in International Conference on Parallel Processing, 1994, pp. 73–76. [7] M. Hurfin, M. Mizuno, M. Raynal, and M. Singhal, “Efficient distributed detection of conjunctions of local predicates,” Software Engineering, vol. 24, no. 8, pp. 664–677, 1998. [8] Z. Yang and T. A. Marsland, “Global snapshots for distributed debugging,” in International Conference on Computing and Information, 1992, pp. 436–440. [9] S. Carr, J. Mayo, and C.-K. Shene, “Race conditions: A case study,” The Journal of Computing in Small Colleges, vol. 17, no. 1, pp. 88– 102, October 2001. [10] S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson, “Eraser: a dynamic data race detector for multithreaded programs,” ACM Trans. Comput. Syst., vol. 15, no. 4, pp. 391–411, 1997. [11] A. Pnueli, “The temporal logic of programs,” in Proceedings of the 18th IEEE Symposium on Foundations of Computer Science, 1977, pp. 46–67. [12] K. B. Lamine and L. Kabanza, “Reasoning about robot actions: A model checking approach,” Advances in Plan-Based Control of Robotic Agents, pp. 123–139, 2002. [13] B. Loo, T. Condie, J. Hellerstein, P. Maniatis, T. Roscoe, and I. Stoica, “Implementing declarative overlays,” in Proceedings of ACM Symposium on Operating System Principles (SOSP), 2005. [14] A. Singh, T. Roscoe, P. Maniatis, and P. Druschel, “Using queries for distributed monitoring and forensics,” in Proceedings of EuroSys 2006, 2006, pp. 389–402. [15] [Online]. Available: http://www.pittsburgh.intel-research.net/dprweb/ [16] N. Nethercote and J. Seward, “Valgrind: A program supervision framework,” Electronic Notes in Theoretical Computer Science, 2003.