Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos New York University, NY, USA Doug Tygar University of California, Berkeley, CA, USA Moshe Y. Vardi Rice University, Houston, TX, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany 3855 E. Allen Emerson Kedar S. Namjoshi (Eds.) Verification, Model Checking, and Abstract Interpretation 7th International Conference, VMCAI 2006 Charleston, SC, USA, January 8-10, 2006 Proceedings 13 Volume Editors E. Allen Emerson University of Texas at Austin, Department of Computer Science Austin, TX 78712, USA E-mail: emerson@cs.utexas.edu Kedar S. Namjoshi Bell Labs, Lucent Technologies 600 Mountain Avenue, Murray Hill, NJ 07974, USA E-mail: kedar@research.bell-labs.com Library of Congress Control Number: 2005937944 CR Subject Classification (1998): F.3.1-2, D.3.1, D.2.4 LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues ISSN ISBN-10 ISBN-13 0302-9743 3-540-31139-4 Springer Berlin Heidelberg New York 978-3-540-31139-3 Springer Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2006 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 11609773 06/3142 543210 Preface This volume contains the papers accepted for presentation at the 7th International Conference on Verification, Model Checking, and Abstract Interpretation, held January 8-10, 2006, at Charleston, South Carolina, USA. VMCAI provides a forum for researchers from the communities of verification, model checking, and abstract interpretation, facilitating interaction, crossfertilization, and advancement of hybrid methods. The program was selected from 58 submitted papers. In addition, the program included invited talks by Edmund M. Clarke (Carnegie Mellon University), James R. Larus (Microsoft Research), and Greg Morrisett (Harvard University), and invited tutorials by Nicolas Halbwachs (VERIMAG) and David Schmidt (Kansas State University). VMCAI was sponsored by the University of Texas at Austin, with additional support from Microsoft Research and NEC Research Labs. We are grateful for the support. We would like to thank the Program Committee and the reviewers for their hard work and dedication in putting together this program. We especially wish to thank Jacob Abraham, director of the Computer Engineering Research Center at the University of Texas at Austin and Debi Prather for their invaluable support and assistance, and Rich Gerber for his help with the START conference management system. January 2006 E. Allen Emerson Kedar S. Namjoshi Organization Program Committee Alex Aiken Thomas Ball Hana Chockler Patrick Cousot E. Allen Emerson (Co-chair) Javier Esparza Roberto Giacobazzi Patrice Godefroid Warren Hunt Neil Jones Tiziana Margaria Markus Müller-Olm Kedar S. Namjoshi (Co-chair) George Necula Jens Palsberg Andreas Podelski Thomas W. Reps A. Prasad Sistla Colin Stirling Scott D. Stoller Lenore Zuck Stanford University, USA Microsoft Research, USA IBM Research, Israel École Normale Supérieure, France University of Texas at Austin, USA University of Stuttgart, Germany University of Verona, Italy Bell Laboratories, USA University of Texas at Austin, USA DIKU, University of Copenhagen, Denmark University of Göttingen, Germany University of Münster, Germany Bell Laboratories, USA University of California at Berkeley, USA UCLA, USA Max-Planck-Institut für Informatik, Germany University of Wisconsin, USA University of Illinois at Chicago, USA University of Edinburgh, UK SUNY at Stony Brook, USA University of Illinois at Chicago, USA Steering Committee Agostino Cortesi Patrick Cousot E. Allen Emerson Giorgio Levi Andreas Podelski Thomas W. Reps David Schmidt Lenore Zuck University of Venice, Italy École Normale Supérieure, France University of Texas at Austin, USA University of Pisa, Italy Max-Planck-Institut für Informatik, Germany University of Wisconsin, USA Kansas State University, USA University of Illinois at Chicago, USA Sponsoring Institutions The University of Texas at Austin Microsoft Research NEC Research Labs VIII Organization Referees Parosh Abdulla Gilad Arnold Gadiel Auerbach James Avery Bernard Boigelot Ahmed Bouajjani Glenn Bruns Scott Cotton Dennis Dams Jared Davis John Erickson Xiang Fu Maria del Mar Gallardo Peter Habermehl Brian Hackett Tom Henzinger Bertrand Jeannet Robert Krug Viktor Kuncak Orna Kupferman Alexander Malkis Damien Masse Isabella Mastroeni Alessio Merlo David Monniaux Anders Møller Serita Neleson Ziv Nevo Tobias Nipkow Avigail Orni Julien d’Orso German Puebla Mila Dalla Preda David Rager C.R. Ramakrishnan Francesco Ranzato Sandip Ray Erik Reeber Xavier Rival Oliver Rüthing Giovanni Scardoni Stefan Schwoon Roberto Segala Zhong Shao Jakob Grue Simonsen Viorica Sofronie-Stokkermans Fausto Spoto Bernhard Steffen Sol Swords Wenkai Tan Francesco Tapparo Tachio Terauchi Helmut Veith Mahesh Viswanathan Hubert Wagner Thomas Wies Daniel Wilkerson Ping Yang Pei Ye Karen Yorav Greta Yorsh Damiano Zanardini Qiang Zhang Min Zhou Table of Contents Closure Operators for ROBDDs Peter Schachte, Harald Søndergaard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 A CLP Method for Compositional and Intermittent Predicate Abstraction Joxan Jaffar, Andrew E. Santosa, Răzvan Voicu . . . . . . . . . . . . . . . . . . 17 Combining Shape Analyses by Intersecting Abstractions Gilad Arnold, Roman Manevich, Mooly Sagiv, Ran Shaham . . . . . . . . 33 A Complete Abstract Interpretation Framework for Coverability Properties of WSTS Pierre Ganty, Jean-François Raskin, Laurent Van Begin . . . . . . . . . . . 49 Complexity Results on Branching-Time Pushdown Model Checking Laura Bozzelli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 A Compositional Logic for Control Flow Gang Tan, Andrew W. Appel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Detecting Non-cyclicity by Abstract Compilation into Boolean Functions Stefano Rossignoli, Fausto Spoto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Efficient Strongly Relational Polyhedral Analysis Sriram Sankaranarayanan, Michael A. Colón, Henny B. Sipma, Zohar Manna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Environment Abstraction for Parameterized Verification Edmund Clarke, Muralidhar Talupur, Helmut Veith . . . . . . . . . . . . . . . 126 Error Control for Probabilistic Model Checking Håkan L.S. Younes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Field Constraint Analysis Thomas Wies, Viktor Kuncak, Patrick Lam, Andreas Podelski, Martin Rinard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 A Framework for Certified Program Analysis and Its Applications to Mobile-Code Safety Bor-Yuh Evan Chang, Adam Chlipala, George C. Necula . . . . . . . . . . . 174 X Table of Contents Improved Algorithm Complexities for Linear Temporal Logic Model Checking of Pushdown Systems Katia Hristova, Yanhong A. Liu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 A Logic and Decision Procedure for Predicate Abstraction of Heap-Manipulating Programs Jesse Bingham, Zvonimir Rakamarić . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Monitoring Off-the-Shelf Components A. Prasad Sistla, Min Zhou, Lenore D. Zuck . . . . . . . . . . . . . . . . . . . . . 222 Parallel External Directed Model Checking with Linear I/O Shahid Jabbar, Stefan Edelkamp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Piecewise FIFO Channels Are Analyzable Naghmeh Ghafari, Richard Trefler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 Ranking Abstraction of Recursive Programs Ittai Balaban, Ariel Cohen, Amir Pnueli . . . . . . . . . . . . . . . . . . . . . . . . . 267 Relative Safety Joxan Jaffar, Andrew E. Santosa, Răzvan Voicu . . . . . . . . . . . . . . . . . . 282 Resource Usage Analysis for the π-Calculus Naoki Kobayashi, Kohei Suenaga, Lucian Wischik . . . . . . . . . . . . . . . . 298 Semantic Hierarchy Refactoring by Abstract Interpretation Francesco Logozzo, Agostino Cortesi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Strong Preservation of Temporal Fixpoint-Based Operators by Abstract Interpretation Francesco Ranzato, Francesco Tapparo . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 Symbolic Methods to Enhance the Precision of Numerical Abstract Domains Antoine Miné . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 Synthesis of Reactive(1) Designs Nir Piterman, Amir Pnueli, Yaniv Sa’ar . . . . . . . . . . . . . . . . . . . . . . . . . 364 Systematic Construction of Abstractions for Model-Checking Arie Gurfinkel, Ou Wei, Marsha Chechik . . . . . . . . . . . . . . . . . . . . . . . . 381 Totally Clairvoyant Scheduling with Relative Timing Constraints K. Subramani . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 Table of Contents XI Verification of Well-Formed Communicating Recursive State Machines Laura Bozzelli, Salvatore La Torre, Adriano Peron . . . . . . . . . . . . . . . . 412 What’s Decidable About Arrays? Aaron R. Bradley, Zohar Manna, Henny B. Sipma . . . . . . . . . . . . . . . . 427 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 Closure Operators for ROBDDs Peter Schachte and Harald Søndergaard Department of Computer Science and Software Engineering, The University of Melbourne, Vic. 3010, Australia Abstract. Program analysis commonly makes use of Boolean functions to express information about run-time states. Many important classes of Boolean functions used this way, such as the monotone functions and the Boolean Horn functions, have simple semantic characterisations. They also have well-known syntactic characterisations in terms of Boolean formulae, say, in conjunctive normal form. Here we are concerned with characterisations using binary decision diagrams. Over the last decade, ROBDDs have become popular as representations of Boolean functions, mainly for their algorithmic properties. Assuming ROBDDs as representation, we address the following problems: Given a function ψ and a class of functions ∆, how to find the strongest ϕ ∈ ∆ entailed by ψ (when such a ϕ is known to exist)? How to find the weakest ϕ ∈ ∆ that entails ψ? How to determine that a function ψ belongs to a class ∆? Answers are important, not only for several program analyses, but for other areas of computer science, where Boolean approximation is used. We give, for many commonly used classes ∆ of Boolean functions, algorithms to approximate functions represented as ROBDDs, in the sense described above. The algorithms implement upper closure operators, familiar from abstract interpretation. They immediately lead to algorithms for deciding class membership. 1 Introduction Propositional logic is of fundamental importance to computer science. While its primary use has been within switching theory, there are many other uses, for example in verification, machine learning, cryptography and program analysis. In complexity theory, Boolean satisfiability has played a seminal role and provided deep and valuable results. Our own interest in Boolean functions stems from work in program analysis. In this area, as in many other practical applications of propositional logic, we are not so much interested in solving Boolean equations, as in using Boolean functions to capture properties and relations of interest. In the process of analysing programs, we build and transform representations of Boolean functions, in order to provide detailed information about runtime states. In this paper we consider various instances of the following problem. Given a Boolean function ϕ and a class of Boolean functions ∆, how can one decide  Peter Schachte’s work has been supported in part by NICTA Victoria Laboratories. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 1–16, 2006. c Springer-Verlag Berlin Heidelberg 2006  2 P. Schachte and H. Søndergaard (efficiently) whether ϕ belongs to ∆? How does one find the strongest statement in ∆ which is entailed by ϕ (assuming this is well defined)? Answers of course depend on how Boolean functions are represented. ROBDDs [4] provide a graphical representation of Boolean functions, based on repeated Boolean development, that is, the principle that in any Boolean algebra, ϕ = (x ∧ ϕ0x ) ∨ (x ∧ ϕ1x ) where ϕux denotes ϕ with x fixed to the truth value u. In this paper, ROBDDs are used to represent Boolean functions. The classes of Boolean functions studied here are classes that have simple syntactic and semantic characterisations. They are are of importance in many areas of computer science, including program analysis. They include the Horn fragment, monotone and antitone Boolean functions, sub-classes of bijunctive functions, and others. There are many examples of the use of approximation in program analysis. Trivial cases are where a program analysis tool uses intermediate results that are of a finer granularity that what gets reported to a user or used by an optimising compiler. Consider for example classical two-valued strictness analysis [18]. The strictness result for the functional program g(x,y,z) = if even(x) then y/2 else 3*z + 1 is calculated to be x ∧ (y ∨ z). This contains a disjunctive component indicating that g needs at least one of its last two arguments, in addition to the first. This disjunctive information is not useful for a compiler seeking to replace callby-name by call-by-value—instead of x ∧ (y ∨ z) a compiler needs the weaker statement x. (Once we have the definitions of Section 3 we can say that what is needed is the strongest V consequence of the calculated result.) That the more fine-grained x∧(y ∨z) is useful as an intermediate result, however, becomes clear when we consider the function f(u,v) = g(u,v,v) whose strictness result is u ∧ (v ∨ v), that is, u ∧ v. Without the disjunctive component in g’s result, the result for f would be unnecessarily weak. Less trivial cases are when approximation is needed in intermediate results, to guarantee correctness of the analysis. Genaim and King’s suspension analysis for logic programs with dynamic scheduling [9] is an example. The analysis, essentially a greatest-fixed-point computation, produces for each predicate p a Boolean formula ϕi expressing groundness conditions under which the atomic formulae in the body of p may be scheduled so as to obtain suspension-free evaluation. In each iteration, the re-calculation of the formulae ϕ includes a crucial step where ϕ is replaced by its weakest monotone implicant. Similarly, set-sharing analysis as presented by Codish et al. [5] relies on an operation that replaces a positive formula by its strongest definite consequence. The contribution of the present paper is two-fold. First, we view a range of important classes of Boolean functions from a new angle. Studying these classes of Boolean functions through the prism of Boolean development (provided by ROBDDs) yields deeper insight both into ROBDDs and the classes themselves. Closure Operators for ROBDDs 3 The first three sections of this paper should therefore be of value to anybody with an interest in the theory of Boolean functions. Second, we give algorithms for ROBDD approximation. The algorithms are novel, pleasingly simple, and follow a common pattern. This more practical contribution may be of interest to anybody who works with ROBDDs, but in particular to those who employ some kind of approximation of Boolean functions, as it happens for example in areas of program analysis, cryptography, machine learning and property testing. The reader is assumed to be familiar with propositional logic and ROBDDs. Section 2 recapitulates ROBDDs and some standard operations, albeit mainly to establish our notation. In Section 3 we define several classes of Boolean functions and establish, for each, relevant properties possessed by their members. Section 4 presents new algorithms for approximating Boolean functions represented as ROBDDs. The aim is to give each method a simple presentation that shows its essence and facilitates a proof of correctness. Section 5 discusses related work and Section 6 concludes. 2 ROBDDs We briefly recall the essentials of ROBDDs [4]. Let the set V of propositional variables be equipped with a total ordering ≺. Binary decision diagrams (BDDs) are defined inductively as follows: – 0 is a BDD. – 1 is a BDD. – If x ∈ V and R1 and R2 are BDDs then ite(x, R1 , R2 ) is a BDD. Let R = ite(x, R1 , R2 ). We say a BDD R appears in R iff R = R or R appears in R1 or R2 . We define vars(R) = {v | ite(v, , ) appears in R}. The meaning of a BDD is given as follows. [[0]] =0 [[1]] =1 [[ite(x, R1 , R2 )]] = (x ∧ [[R1 ]]) ∨ (x ∧ [[R2 ]]) A BDD is an OBDD iff it is 0 or 1 or if it is ite(x, R1 , R2 ), R1 and R2 are OBDDs, and ∀x ∈ vars(R1 ) ∪ vars(R2 ) : x ≺ x . An OBDD R is an ROBDD (Reduced Ordered Binary Decision Diagram, [3, 4]) iff for all BDDs R1 and R2 appearing in R, R1 = R2 when [[R1 ]] = [[R2 ]]. Practical implementations [2] use a function mknd(x, R1 , R2 ) to create all ROBDD nodes as follows: 1. If R1 = R2 , return R1 instead of a new node, as [[ite(x, R1 , R2 )]] = [[R1 ]]. 2. If an identical ROBDD was previously built, return that one instead of a new one; this is accomplished by keeping a hash table, called the unique table, of all previously created nodes. 3. Otherwise, return ite(x, R1 , R2 ). This ensures that ROBDDs are strongly canonical: a shallow equality test is sufficient to determine whether two ROBDDs represent the same Boolean function. 4 P. Schachte and H. Søndergaard Figure 1 shows an example ROBDD in diagramx matic form. This ROBDD denotes the function (x ← y) → z. An ROBDD ite(x, R1 , R2 ) is depicted y as a directed acyclic graph rooted in x, with a solid arc from x to the dag for R1 and a dashed line from z x to the dag for R2 . It is important to take advantage of fan-in to 1 0 create efficient ROBDD algorithms. Often some Fig. 1. (x ← y) → z ROBDD nodes will appear multiple times in a given ROBDD, and algorithms that traverse that ROBDD will meet these nodes multiple times. Many algorithms can avoid repeated work by keeping a cache of previously seen inputs and their corresponding outputs, called a computed table. See Brace et al. [2] for details. We assume a computed table is used for all recursive ROBDD algorithms presented here. 3 Boolean Function Classes as Closure Operators Let B = {0 , 1 }. A Boolean function over variable set V = {x1 , . . . , xn } is a function ϕ : B n → B. We assume a fixed, finite set V of variables, and use B to denote the set of all Boolean functions over V. The ordering on B is the usual: x ≤ y iff x = 0 ∨ y = 1 . B is ordered pointwise, that is, the ordering relation is entailment, |=. The class C ⊂ B contains just the two constant functions. As is common, we use ‘0 ’ and ‘1 ’ not only to denote the elements of B, but also for the elements of C. The class 1 ⊂ C, contains only the element 1 . A valuation µ : V → B is an assignment of truth values to the variables in V. Valuations are ordered pointwise. We will sometimes write a valuation as the set of variables which are assigned the value 1 . In this view, the meet operation on valuations is set intersection, and the join is set union. A valuation µ is a model for ϕ, denoted µ |= ϕ, if ϕ(µ(x1 ), . . . , µ(xn )) = 1 . In the “set” view, the set of models of ϕ is a set of sets of variables, namely    [[ϕ]]V = {x ∈ V | µ(x) = 1 }  µ |= ϕ . Again, we will often omit the subscript V as it will be clear from the context. We will also switch freely amongst the views of ϕ as a function, a formula, and as its set of models, relying on the reader to disambiguate from context. We say that a Boolean function ϕ is model-meet closed if, whenever µ |= ϕ and µ |= ϕ, we also have µ ∩ µ |= ϕ. In other words, ϕ is model-meet closed if [[ϕ]] is closed under intersection. Similarly, ϕ is model-join closed if, whenever µ |= ϕ and µ |= ϕ, also µ ∪ µ |= ϕ. We likewise say that a Boolean function ϕ is downward closed if, whenever µ |= ϕ, we also have µ ∩ µ |= ϕ for all valuations µ , and similarly ϕ is upward closed if, whenever µ |= ϕ, we also have µ ∪ µ |= ϕ for all valuations µ . Note that a downward closed function is necessarily modelmeet closed, and an upward closed function is model-join closed. Closure Operators for ROBDDs 5 Let V = {x | x ∈ V} be the set of negated variables. A literal is a member of the set L = V ∪ V, that is, a variable or a negated variable. We say that ϕ is independent of literal x (and also of literal x) when for all models µ of ϕ, µ \ {x} |= ϕ iff µ ∪ {x} |= ϕ. The dual of a Boolean function ϕ is the function that is obtained by interchanging the roles of 1 and 0 . A simple way of turning a formula for ϕ into a formula for ϕ’s dual is to change the sign of every literal in ϕ and negate the whole resulting formula. For example, the dual of x ∧ (y ∨ z) is x ∨ (y ∧ z) — De Morgan’s laws can be regarded as duality laws. Define ϕ◦ as the dual of ϕ. Following Halmos [13], we call ϕ◦ the contra-dual of ϕ. Clearly, given a formula for ϕ, a formula for ϕ◦ is obtained by changing the sign of each literal in ϕ. As an example, ((x ↔ y) → z)◦ = (x ↔ y) → z. Alternatively, given a truth table for a Boolean function, the truth table for its contra-dual is obtained by turning the result column upside down. Given an ROBDD R for ϕ, we can also talk about R’s contra-dual, R◦ , which represents ϕ◦ . An ROBDD’s contra-dual is obtained by simultaneously making all solid arcs dashed, and all dashed arcs solid. Clearly the mapping ϕ → ϕ◦ is an involution, and monotone: ψ |= ϕ iff ◦ ψ |= ϕ◦ . Note that ϕ◦ is model-join closed iff ϕ is model-meet closed. For any class ∆ of Boolean functions, we let ∆◦ denote the class {ϕ◦ | ϕ ∈ ∆}. The classes of Boolean functions considered in this paper can all be seen as upper closures of B. Recall that an upper closure operator (or just uco) ρ : L → L on a complete lattice L satisfies the following constraints: – It is monotone: x  y implies ρ(x)  ρ(y) for all x, y ∈ L. – It is extensive: x  ρ(x) for all x ∈ L. – It is idempotent: ρ(x) = ρ(ρ(x)) for all x ∈ L. As each class ∆ under study contains 1 and is closed under conjunction (this will be obvious from the syntactic characterisations given below), ∆ is a lattice. Moreover, the mapping ρ∆ : B → B, defined by  ρ∆ (ψ) = {ϕ ∈ ∆ | ψ |= ϕ} is a uco on B. Since it is completely determined by ∆, and vice versa, we will usually denote ρ∆ simply as ∆. The view of such classes (abstract domains) ∆ as upper closure operators has been popular ever since the seminal papers on abstract interpretation [6, 7]. We list some well-known properties of closure operators [23, 6]. Let L be a complete lattice (L, ⊥, , , ) and let ρ : L → L be a uco. Then ρ(L) is a complete lattice (ρ(L), ρ(⊥), , , λS.ρ( S)). It is a complete sublattice of L if and only if ρ is additive, that is, ρ( S) =  ρ(S) for all S ⊆ L. In any case, ρ( S)   ρ(S)= ρ( ρ(S)) (1)  ρ(S)  ρ( S)= ρ( ρ(S)) (2) 6 P. Schachte and H. Søndergaard Given two upper closure operators ρ and ρ on the same lattice L, ρ ◦ ρ need not be an upper closure operator. However, if ρ ◦ ρ = ρ ◦ ρ then the composition is also an upper closure operator, and ρ(ρ (L)) = ρ (ρ(L)) = ρ(L) ∩ ρ (L) [10, 19]. The Boolean function classes we focus on here are those characterised by combinations of certain interesting semantic properB ties: model-meet closure, model-join closure, downward closure, and upward closure. Nine H H◦ classes are spanned, as shown in the Hasse diagram of Figure 2. These classes are choV→ M M◦ sen for their importance in program analysis, although our method applies to many other V V◦ natural classes, as we argue in Section 5. We define H to be the set of model-meet C closed Boolean functions, so H◦ is the set of model-join closed functions. Similarly, we Fig. 2. Boolean function classes define M to be the upward-closed functions, ◦ with M the downward-closed functions. We define V→ to be the set of Boolean functions that are both model-meet and model-join closed; i.e., H ∩ H◦ . In what follows we utilise that V→ is a uco, and therefore H ◦ H◦ = H◦ ◦ H = V→ . We also define V = H ∩ M = V→ ∩ M and V◦ = H◦ ∩ M◦ = V→ ∩ M◦ , both of which are ucos, as well. Finally, we observe that C = M ∩ M◦ = V ∩ V◦ is also a uco. One can think of these elements as classes of functions, ordered by subset ordering, or alternatively, as upper closure operators, ordered pointwise (in particular, B is the identity function, providing loss-less approximation). These classes (ucos) have a number of properties in common. All are closed under conjunction and existential quantification. None are closed under negation, and hence none are closed under universal quantification. All are closed under the operation of fixing a variable to a truth value. Namely, we can express instantiation using only conjunction and existential quantification. We write the fixing of x in ϕ to 0 as ϕ0x ≡ ∃x : ϕ ∧ x and the fixing of x in ϕ to 1 as ϕ1x ≡ ∃x : ϕ ∧ x. Finally, all of these classes enjoy a property that is essential to our algorithms: they do not introduce variables. For each uco ∆ considered and each x ∈ V and ϕ ∈ B, if ϕ is independent of x, then so is ∆(ϕ). In Section 5 we discuss a case where this property fails to hold. We now define the various classes formally and establish some results that are essential in establishing the correctness of the algorithms given in Section 4. 3.1 The Classes M and M◦ The class M of monotone functions consists of the functions ϕ satisfying the following requirement: for all valuations µ and µ , µ ∪ µ |= ϕ when µ |= ϕ. (These functions are also referred to as isotone.) Syntactically the class is most conveniently described as the the class of functions generated by {∧, ∨, 0 , 1 }, see for example Rudeanu’s [21] Theorem 11.3. It follows that the uco M is additive. Closure Operators for ROBDDs 7 The class M◦ = {ϕ◦ | ϕ ∈ M} consists of the antitone Boolean functions. Functions ϕ in this class have the property that, for all valuations µ and µ , if µ |= ϕ, then µ ∩ µ |= ϕ. M◦ , too, is additive. As ROBDDs are based on the idea of repeated Boolean development, we are particularly interested in characterising class membership for formulae of the forms x ∨ ϕ and x ∨ ϕ (with ϕ independent of x). Lemma 1. Let ϕ ∈ B be independent of x ∈ V. Then (a) x ∨ ϕ ∈ M iff ϕ ∈ M (b) x ∨ ϕ ∈ M iff ϕ ∈ 1 (c) x ∨ ϕ ∈ M◦ iff ϕ ∈ 1 (d) x ∨ ϕ ∈ M◦ iff ϕ ∈ M◦ Proof: In all cases, the ‘if’ direction is obvious from the well-known syntactic characterisations of M and M◦ . We show the ‘only if’ direction for cases (a) and (b); the proofs for (c) and (d) are similar. (a) Let ϕ ∈ B be independent of x ∈ V, such that x ∨ ϕ ∈ M. Consider a model µ of ϕ. Since ϕ is independent of x, we have that µ\{x} |= x∨ϕ. Let µ be an arbitrary valuation. Then (µ \ {x}) ∪ (µ \ {x}) |= x ∨ ϕ, so (µ ∪ µ ) \ {x} |= ϕ. Thus µ ∪ µ |= ϕ, and since µ was arbitrary, ϕ ∈ M. (b) Suppose x ∨ ϕ ∈ M. We show that every valuation is a model of ϕ. For any valuation µ, µ \ {x} |= x ∨ ϕ. But then, µ ∪ {x} |= x ∨ ϕ, as x ∨ ϕ ∈ M. As ϕ is independent of x, µ |= ϕ. But µ was arbitrary, so ϕ must be 1 . 3.2 The Classes H and H◦ The class H of propositional Horn functions is exactly the set of model-meet closed Boolean functions. That is, every H function ϕ satisfies the following requirement: for all valuations µ and µ , if µ |= ϕ and µ |= ϕ, then µ ∩ µ |= ϕ. Similarly, H◦ is the set of model-join closed Boolean functions, satisfying the requirement that for all valuations µ and µ , if µ |= ϕ and µ |= ϕ, then µ∪µ |= ϕ. There are well-known syntactic characterisations of these classes. H is the  set of functions that can be written in conjunctive normal form (1 ∨ · · · ∨ n ) with at most one positive literal  per clause, while H◦ functions can be written in conjunctive normal form with each clause containing at most one negative literal.1 It is immediate that M ⊆ H◦ and M◦ ⊆ H. The next lemma characterises membership of H and H◦ , for the case  ∨ ϕ. Lemma 2. Let ϕ ∈ B be independent of x ∈ V. Then (a) x ∨ ϕ ∈ H iff ϕ ∈ M◦ (b) x ∨ ϕ ∈ H iff ϕ ∈ H (c) x ∨ ϕ ∈ H◦ iff ϕ ∈ H◦ (d) x ∨ ϕ ∈ H◦ iff ϕ ∈ M Proof: In all cases, the ‘if’ direction follows easily from the syntactic characterisations of the classes. We prove the ‘only if’ directions for (a) and (b), as (c) and (d) are similar. 1 An unfortunate variety of nomenclatures is used in Boolean taxonomy. For example, Schaefer [22] uses “weakly negative” for H and “weakly positive” for H◦ . Ekin et al. [8] use the term “Horn” to refer to {ϕ | ϕ ∈ H} and “positive” for M, while we use the word “positive” to refer to another class entirely (see Section 5). 8 P. Schachte and H. Søndergaard (a) Assume x ∨ ϕ ∈ H and x is independent of ϕ. Let µ be a model for ϕ and let µ be an arbitrary valuation. Both µ \ {x} and µ ∪ {x} are models for x ∨ ϕ. As x ∨ ϕ ∈ H, their intersection is a model as well, that is, (µ ∩ µ ) \ {x} |= x ∨ ϕ. But then (µ ∩ µ ) |= x ∨ ϕ, and as µ was arbitrary, it follows that ϕ ∈ M◦ . (b) Assume x ∨ ϕ ∈ H and x is independent of ϕ. Consider models µ and µ for ϕ. As ϕ is independent of x, µ \ {x} and µ \ {x} are models for x ∨ ϕ, and so (µ \ {x}) ∩ (µ \ {x}) |= x ∨ ϕ. But then (µ \ {x}) ∩ (µ \ {x}) |= ϕ, hence µ ∩ µ |= ϕ, so ϕ ∈ H. 3.3 The Class V→ We define V→ = H ∩ H◦ . Hence this is the class of Boolean functions ϕ that are both model-meet and model-join closed. For all valuations µ and µ , µ ∩ µ |= ϕ and µ ∪ µ |= ϕ when µ |= ϕ and µ |= ϕ. Since H and H◦ commute as closure operators, we could equally well have defined V→ = H ◦ H◦ . Syntactically, V→ consists of exactly  those Boolean functions that can be written in conjunctive normal form c with each clause c taking one of four ◦ = V→ . forms: 0 , x, x, or x → y. Note that V→ Lemma 3. Let ϕ ∈ B be independent of x ∈ V. Then (a) x ∨ ϕ ∈ V→ iff ϕ ∈ V◦ (b) x ∨ ϕ ∈ V→ iff ϕ ∈ V Proof: (a) Since x ∨ ϕ ∈ V→ , we know that x ∨ ϕ ∈ H and x ∨ ϕ ∈ H◦ . Then by Lemma 2, ϕ ∈ M◦ and ϕ ∈ H◦ . Thus ϕ ∈ V◦ . The proof for (b) is similar. 3.4 The Classes V, V◦ , C, and 1 We define V to be the class of model-meet and upward closed Boolean functions. Syntactically, ϕ ∈ V iff ϕ = 0 or ϕ can be written as a (possibly empty) conjunction of positive literals. Dually, V◦ is the class of model-join and downward closed Boolean functions — those that can be written as 0 or (possibly empty) conjunctions of negative literals. C is the set of Boolean functions that are both upward and downward closed. This set contains only the constant functions 0 and 1 . Finally, 1 consists of only the constant function 1 . The next lemma is trivial, but included for completeness. Lemma 4. Let ϕ ∈ B be independent of x ∈ V. Then (a) (b) (c) (d) x∨ϕ∈V x∨ϕ∈V x ∨ ϕ ∈ V◦ x ∨ ϕ ∈ V◦ iff ϕ ∈ C iff ϕ ∈ 1 iff ϕ ∈ 1 iff ϕ ∈ C (e) (f) (g) (h) x ∨ ϕ ∈ C iff ϕ ∈ 1 x ∨ ϕ ∈ C iff ϕ ∈ 1 x ∨ ϕ ∈ 1 iff ϕ ∈ 1 x ∨ ϕ ∈ 1 iff ϕ ∈ 1 Proof: The proof is similar to that of Lemma 3. Closure Operators for ROBDDs 4 9 Algorithms for Approximating ROBDDs In this section we show how to find upper approximations within the classes of the previous section. Algorithms for lower approximation can be obtained in a parallel way. We assume input and output given as ROBDDs. In this context, the main obstacle to the development of algorithms is a lack of distributivity and substitutivity properties amongst closure operators. To exemplify the problems in the context of H, given ϕ = x ↔ y and ψ = x ∨ y, we have H(ϕ ∧ ψ) = H(x ∧ y) = x ∧ y H(x ∨ y) = 1 (H(x ∨ y))0x = 1x0 = 1 = = = (x ↔ y) ∧ 1 = H(ϕ) ∧ H(ψ) x ∨ y = H(x) ∨ H(y) y = H(y) = H((x ∨ y)0x ) Nevertheless, for a large number of commonly used classes, Boolean development gives us a handle to restore a limited form of distributivity. The idea is as follows. Let σ = (x ∧ ϕ) ∨ (x ∧ ψ). We can write σ alternatively as σ = (ψ → x) ∧ (x → ϕ) showing how the “subtrees” ϕ and ψ communicate with x. As we have seen, we cannot in general find ρ(σ), even for “well-behaved” closure operators ρ, by distribution—the following does not hold: ρ(σ) = ρ(ψ → x) ∧ ρ(x → ϕ) Suppose however that we add a redundant conjunct to the expression for σ: σ = (ψ → x) ∧ (x → ϕ) ∧ (ϕ ∨ ψ) The term ϕ∨ψ is redundant, as it is nothing but ∃x(σ). The point that we utilise here is that, for a large number of natural classes (or upper closure operators) ρ, distribution is allowed in this context, that is, ρ(σ) = ρ(ψ → x) ∧ ρ(x → ϕ) ∧ ρ(ϕ ∨ ψ) The intuition is that the information that is lost by ρ(ψ → x) ∧ ρ(x → ϕ), namely the “ρ” information shared by ϕ and ψ is exactly recovered by ρ(ϕ ∨ ψ). Figure 3 shows, for reference, the ROBDD for a function, (x → z) ∧ (y → z), and the ROBDDs that result from different approximations. Before we present our approximation algorithms, we need one more lemma. Lemma 5. Let ϕ ∈ B be independent of x ∈ V. (a) if x ∨ ϕ ∈ ∆ ↔ ϕ ∈ ∆ then ∆(x ∨ ϕ) = x ∨ ∆ (ϕ) (b) if x ∨ ϕ ∈ ∆ ↔ ϕ ∈ ∆ then ∆(x ∨ ϕ) = x ∨ ∆ (ϕ) 10 P. Schachte and H. Søndergaard y x y y z 1 z z 0 0 0 0 x 1 x x y y 1 z 0 0 z 1 1 M(ϕ) = y ∨ z z y 1 1 ϕ = (x → z) ∧ (y ↔ z) 1 y 1 1 H(ϕ) = (x → z) ∧ (z ∨ y) 0 M◦ (ϕ) = (x ∨ y) ∧ (z ∨ y) x y 0 z 1 z 1 0 0 0 1 1 H◦ (ϕ) = (x → z) ∧ (y ∨ z) V→ (ϕ) = x → z Fig. 3. ROBDDs for (x → z) ∧ (y ↔ z) and some approximations of it Proof: We show (a)—the proof for (b) is similar. Assume x ∨ ϕ ∈ ∆ ↔ ϕ ∈ ∆ .  ∆(x ∨ ϕ) = {ψ  ∈ ∆ | x ∨ ϕ |= ψ  } = {ψ  ∈ ∆ | x |= ψ  and ϕ |= ψ  } = {x ∨ ψ | x ∨ ψ ∈ ∆ and ϕ |= x ∨ ψ} = {x ∨ ψ | x ∨ ψ ∈ ∆ and ϕ |= ψ} = {x∨ ψ | ψ ∈ ∆ and ϕ |= ψ} = x ∨ {ψ | ψ ∈ ∆ and ϕ |= ψ} = x ∨ ∆ (ϕ). 4.1 ψ  is of the form x ∨ ψ ϕ is independent of x premise The Upper Closure Operators H and H◦ Algorithm 1. To find the strongest H consequence of a Boolean function: H(0) =0 H(1) =1 H(ite(x, R1 , R2 )) = mknd(x, Rt , Rf ) where R = H(or(R1 , R2 )) and Rt = H(R1 ) and Rf = and(M◦ (R2 ), R ) Closure Operators for ROBDDs 11 To prove the correctness of this algorithm, we shall need the following lemma: Lemma 6. Let ϕ ∈ B be independent of x ∈ V. Then (a) H(x ∧ ϕ) = x ∧ H(ϕ) (b) H(x ∧ ϕ) = x ∧ H(ϕ) (c) H◦ (x ∧ ϕ) = x ∧ H◦ (ϕ) (d) H◦ (x ∧ ϕ) = x ∧ H◦ (ϕ) Proposition 1. For any ROBDD R, H[[R]] = [[H(R)]]. Proof: By induction on vars(R). When vars(R) = ∅, R must be either 0 or 1; in these cases the proposition holds. Assume vars(R) = ∅ and take R = ite(x, R1 , R2 ). vars(R) ⊃ vars(or(R1 , R2 )), so the induction is well-founded. Let R = H(or(R1 , R2 )). By the induction hypothesis, [[R ]] = H[[or(R1 , R2 )]] = H([[R1 ]] ∨ [[R2 ]]). We prove first that H[[R]] |= [[H(R)]]. Note that x ∨ H(ψ) |= H(x ∨ H(ψ)) = H(x ∨ ψ) x ∨ H(ϕ) |= H(x ∨ H(ϕ)) = H(x ∨ ϕ) H(ϕ) ∨ H(ψ) |= H(H(ϕ) ∨ H(ψ)) = H(ϕ ∨ ψ) Since H and ∧ are monotone, H[(x ∨ H(ψ)) ∧ (x ∨ H(ϕ)) ∧ (H(ϕ) ∨ H(ψ))] |= H[H(x ∨ ψ) ∧ H(x ∨ ϕ) ∧ H(ϕ ∨ ψ)]. Hence, by (1), H[(x ∨ H(ψ)) ∧ (x ∨ H(ϕ)) ∧ (H(ϕ) ∨ H(ψ))] |= H(x ∨ ψ) ∧ H(x ∨ ϕ) ∧ H(ϕ ∨ ψ) (3) Now we have H[[R]] = H[(x ∧ [[R1 ]]) ∨ (x ∧ [[R2 ]])] = H(H(x ∧ [[R1 ]]) ∨ H(x ∧ [[R2 ]])) = H[(x ∧ H[[R1 ]]) ∨ (x ∧ H[[R2 ]])] = H[(x ∨ H[[R2 ]]) ∧ (x ∨ H[[R1 ]]) = H[(x ∨ H[[R2 ]]) ∧ (x ∨ H[[R1 ]]) ∧ (H[[R1 ]] ∨ H[[R2 ]])] |= H(x ∨ [[R2 ]]) ∧ H(x ∨ [[R1 ]])) ∧ H([[R1 ]] ∨ [[R2 ]]) = (x ∨ M◦ [[R2 ]]) ∧ (x ∨ H[[R1 ]]) ∧ [[R ]] = (x ∧ H[[R1 ]] ∧ [[R ]]) ∨ (x ∧ M◦ [[R2 ]] ∧ [[R ]]) = (x ∧ H[[R1 ]]) ∨ (x ∧ M◦ [[R2 ]] ∧ [[R ]]) = (x ∧ [[H(R1 )]]) ∨ (x ∧ [[M◦ (R2 )]] ∧ [[R ]]) = [[mknd(x, H(R1 ), and(M◦ (R2 ), R ))]] = [[H(R)]] uco property Lemma 6 distribution Equation 3 Lemmas 2 and 5 distribution H is monotone Ind. hyp., Prop 4 Next we show [[H(R)]] |= H[[R]]. From the development above, it is clear that this amounts to showing that (x ∧ H[[R1 ]] ∧ [[R ]]) ∨ (x ∧ M◦ [[R2 ]] ∧ [[R ]]) |= H[(x ∧ [[R1 ]]) ∨ (x ∧ [[R2 ]])] By Lemma 6, x ∧ H[[R1 ]] = H(x ∧ [[R1 ]]). So clearly x ∧ H[[R1 ]] ∧ [[R ]] |= H(x ∧ [[R1 ]]) ∨ H(x ∧ [[R2 ]]), so x ∧ H[[R1 ]] ∧ [[R ]] |= H((x ∧ [[R1 ]]) ∨ (x ∧ [[R2 ]])). It remains to prove that x∧M◦ [[R2 ]]∧[[R ]] |= H[(x∧[[R1 ]])∨(x ∧[[R2 ]])]. If the left-hand side 12 P. Schachte and H. Søndergaard is false, then the claim holds trivially. So let µ be a model of x ∧ M◦ [[R2 ]] ∧ [[R ]]. Thus µ |= x, µ |= M◦ [[R2 ]], and µ |= [[R ]], and we must show µ entails the right-hand side. Let us consider three exhaustive cases. First assume µ |= H[[R2 ]]. Then since µ |= x, and by Lemma 6, µ |= H(x ∧ [[R2 ]]), so certainly µ |= H(x ∧ [[R1 ]]) ∨ H(x ∧ [[R2 ]]). Then µ must entail the weaker H(H(x ∧ [[R1 ]]) ∨ H(x ∧ [[R2 ]])), which by uco properties is equivalent to H((x ∧ [[R1 ]]) ∨ (x ∧ [[R2 ]])). Next assume µ |= H[[R1 ]] and µ |= H[[R2 ]]. Since µ |= M◦ [[R2 ]], we know that there is some µ such that µ |= [[R2 ]], and that µ ⊆ µ . Then µ \ {x} |= x∧H[[R2 ]]∨(x∧[[R1 ]]), so µ \{x} must entail the weaker H((x∧[[R1 ]])∨(x∧[[R2 ]])). We have also assumed µ |= H[[R1 ]], so by similar argument µ ∪ {x} |= H((x ∧ [[R1 ]]) ∨ (x ∧ [[R2 ]])). Then (µ ∪ {x}) ∩ (µ \ {x}) |= H((x ∧ [[R1 ]]) ∨ (x ∧ [[R2 ]])). But since µ ⊆ µ , and since x ∈ µ, (µ ∪ {x}) ∩ (µ \ {x}) = µ, and therefore µ |= H((x ∧ [[R1 ]]) ∨ (x ∧ [[R2 ]])). Finally, assume µ |= H[[R1 ]] and µ |= H[[R2 ]]. Since µ |= [[R ]], µ must be the intersection models of H[[R1 ]] and H[[R2 ]]. So let µ+ and µ− be interpretations such that µ+ |= H[[R1 ]] and µ− |= H[[R2 ]] and µ = µ+ ∩ µ− . Then, similar to the previous case, (µ+ ∪ {x}) |= H((x ∧ [[R1 ]]) ∨ (x ∧ [[R2 ]])) and (µ− \ {x}) |= H((x ∧ [[R1 ]]) ∨ (x ∧ [[R2 ]])). But since µ = µ+ ∩ µ− , we know (µ+ ∪ {x}) ∩ (µ− \ {x}) = µ, and therefore µ |= H((x ∧ [[R1 ]]) ∨ (x ∧ [[R2 ]])). Algorithm 2. To find the strongest H◦ consequence of a Boolean function: H◦ (0) =0 =1 H◦ (1) H◦ (ite(x, R1 , R2 )) = mknd(x, Rt , Rf ) where R = H◦ (or(R1 , R2 )) and Rt = and(M(R1 ), R ) and Rf = H◦ (R2 ) Proposition 2. For any ROBDD R, H◦ [[R]] = [[H◦ (R)]]. Proof: Similar to Proposition 1. 4.2 The Upper Closure Operators M and M◦ The algorithms and proofs for M and M◦ are simpler, because these closure operators are additive. Algorithm 3. To find the strongest M consequence of a Boolean function: M(0) =0 M(1) =1 M(ite(x, R1 , R2 )) = mknd(x, or(R1 , R2 ), R2 ) where R1 = M(R1 ) and R2 = M(R2 ) Proposition 3. For any ROBDD R, M[[R]] = [[M(R)]]. Closure Operators for ROBDDs 13 Proof: By structural induction. For R = 0 and R = 1 the proposition clearly holds. Consider R = ite(x, R1 , R2 ) and let R1 = M(R1 ) and R2 = M(R2 ). M[[R]] = M((x ∧ [[R1 ]]) ∨ (x ∧ [[R2 ]])) = (M(x) ∧ M[[R1 ]]) ∨ (M(x) ∧ M[[R2 ]]) = (x ∧ M[[R1 ]]) ∨ M[[R2 ]] = (x ∧ [[R1 ]]) ∨ [[R2 ]] = (x ∧ ([[R1 ]] ∨ [[R2 ]])) ∨ (x ∧ [[R2 ]]) = (x ∧ [[or(R1 , R2 )]]) ∨ (x ∧ [[R2 ]]) = [[mknd(x, or(R1 , R2 ), R2 )]] = [[M(R)]] M is additive induction hypothesis development around x Algorithm 4. To find the strongest M◦ consequence of a Boolean function: =0 M◦ (0) =1 M◦ (1) M◦ (ite(x, R1 , R2 )) = mknd(x, R1 , or(R1 , R2 )) where R1 = M◦ (R1 ) and R2 = M◦ (R2 ) Proposition 4. For any ROBDD R, M◦ [[R]] = [[M◦ (R)]]. Proof: Similar to Proposition 3. 4.3 The Upper Closure Operator V→ Algorithm 5. To find the strongest V→ consequence of a Boolean function: V→ (0) =0 V→ (1) =1 V→ (ite(x, R1 , R2 )) = mknd(x, and(V(R1 ), R ), and(V◦ (R2 ), R )) where R = V→ (or(R1 , R2 )) Proposition 5. For any ROBDD R, V→ [[R]] = [[V→ (R)]]. Proof: This follows from the fact that V→ = H ∩ H◦ . We omit the details. 4.4 The Upper Closure Operators C, V, and V◦ The remaining algorithms are given here for completeness. Their correctness proofs are straightforward. Algorithm 6. To find the strongest V consequence of a Boolean function: V(0) = 0 V(1) = 1 V(ite(x, R1 , R2 )) = mknd(x, R , and(C(R2 ), R )) where R = V(or(R1 , R2 )) Algorithm 7. To find the strongest V◦ consequence of a Boolean function: V◦ (0) = 0 V◦ (1) = 1 V◦ (ite(x, R1 , R2 )) = mknd(x, and(C(R1 ), R ), R ) where R = V◦ (or(R1 , R2 )) Algorithm 8. To find the strongest C consequence of a Boolean function: C(0) = 0 C(1) = 1 C(ite(x, R1 , R2 )) = 1 14 5 P. Schachte and H. Søndergaard Discussion and Related Work The classes we have covered are but a few examples of the generality of our approach. Many other classes fall under the same general scheme as the algorithms in Section 4. One such is L. Syntactically, ϕ ∈ L iff ϕ = 0 or ϕ can be written as a (possibly empty) conjunction of literals. Slightly more general is the class Bij of bijunctive functions. Members of this class can be written in clausal form with at most two literals per clause. A class central to many analyses of logic programs is that of positive functions [16, 17]. Let µ be the unit valuation, that is, µ = 1 for all x ∈ V. Then ϕ is positive iff µ |= ϕ. We denote the class of positive functions by Pos. This class is interesting in the context of ROBDDs, as it is a class which is easily recognisable but problematic to find approximations in. To decide whether an ROBDD represents a positive function, simply follow the solid-arc path from the root to a sink—the function is positive if and only if the sink is 1 . Approximation, however, can not be done in general without knowledge of the entire space of variables, and not all variables necessarily appear  in the ROBDD. For example, if the set of variables is V, then Pos(xi ) = xi → V, which depends on every variable in V. We should note, however, that this does not mean that our approximation algorithms are useless for sub-classes of Pos. On the contrary, they work seamlessly for the positive sub-classes commonly used in program analysis, discussed below, as long as positive functions are being approximated (which is invariably the case). The classes we have discussed above are not sub-classes of Pos. (In both M and V, however,  Pos B the only non-positive element is 0 .) Restricting the classes to their positive counterparts, we  Def H obtain classes that all have found use in program analysis. Figure 4 shows the corresponV→  2IMP dence. The classes on the right are obtained by intersecting those on the left with Pos. We Fig. 4. Positive fragments mention just a few example uses. In the context of groundness analysis for constraint logic programs, Pos and Def are discussed by Armstrong et al. [1]. Def is used for example by Howe and King [15]. 2IMP is found in the exception analysis of Glynn et al. [12]. We have also omitted characterizations and algorithms for V↔ , the class of functions that can be written as conjunctions of literals and biimplications of the form x ↔ y with x, y ∈ V. This class corresponds to the set of all possible partitionings of V ∪ {0 , 1 }. Its restriction to the positive fragment is exactly Heaton et al.’s “EPos” domain [14]. M is a class which is of considerable interest in many contexts. In program analysis it has a classical role: Mycroft’s well-known two-valued strictness analysis for first-order functional programs [18] uses M to capture non-termination information. The classes we have considered are of much theoretical interest. The classes Bij, H, H◦ , Pos and Pos◦ are five of the six classes from Schaefer’s dichotomy result [22] (the sixth is the class of affine Boolean functions). M plays a role in Closure Operators for ROBDDs 15 Post’s functional completeness result [20], together with the affine functions, Pos and its dual, and the class of self-dual functions. Giacobazzi and Scozzari provide interesting characterisations of domains including Pos in terms of domain completion using natural domain operations [11]. The problem of approximating Boolean functions appears in many contexts in program analysis. We already mentioned Genaim and King’s suspension analysis [9] and the formulation of set-sharing using Pos, by Codish et al. [5]. Another possible application is in the design of widening operators for abstract interpretation-based analyses. 6 Conclusion We have provided algorithms to find upper approximations for Boolean functions represented as ROBDDs. The algorithms all follow the same general pattern, which works for a large number of important classes of Boolean functions. They also provide a way of checking an ROBDD R for membership of a given class ∆: Simply check whether R = ∆(R). In the design of our algorithms we have emphasised clarity rather than efficiency. We note that the critical term ∆(ϕ ∨ ψ) is identical to the join ∆(ϕ) ∆ ∆(ψ), so in many cases, efficient approximation algorithms may boil down to efficient computation of the join. Future research will include a search for appropriate data structures and associated complexity analyses, as well as attempts at a more general and abstract approach to the algorithms and proofs. References 1. T. Armstrong, K. Marriott, P. Schachte, and H. Søndergaard. Two classes of Boolean functions for dependency analysis. Science of Computer Programming, 31(1):3–45, 1998. 2. K. Brace, R. Rudell, and R. Bryant. Efficient implementation of a BDD package. In Proc. Twenty-seventh ACM/IEEE Design Automation Conf., pages 40–45, 1990. 3. R. Bryant. Graph-based algorithms for Boolean function manipulation. IEEE Trans. Computers, C–35(8):677–691, 1986. 4. R. Bryant. Symbolic Boolean manipulation with ordered binary-decision diagrams. ACM Computing Surveys, 24(3):293–318, 1992. 5. M. Codish, H. Søndergaard, and P. J. Stuckey. Sharing and groundness dependencies in logic programs. ACM Transactions on Programming Languages and Systems, 21(5):948–976, 1999. 6. P. Cousot and R. Cousot. Static determination of dynamic properties of recursive procedures. In E. J. Neuhold, editor, Formal Description of Programming Concepts, pages 237–277. North-Holland, 1978. 7. P. Cousot and R. Cousot. Systematic design of program analysis frameworks. In Proc. Sixth ACM Symp. Principles of Programming Languages, pages 269–282. ACM Press, 1979. 8. O. Ekin, S. Foldes, P. L. Hammer, and L. Hellerstein. Equational characterizations of Boolean function classes. Discrete Mathematics, 211:27–51, 2000. 16 P. Schachte and H. Søndergaard 9. S. Genaim and A. King. Goal-independent suspension analysis for logic programs with dynamic scheduling. In P. Degano, editor, Proc. European Symp. Programming 2006, volume 2618 of LNCS, pages 84–98. Springer, 2003. 10. R. Giacobazzi. Semantic Aspects of Logic Program Analysis. PhD thesis, University of Pisa, Italy, 1993. 11. R. Giacobazzi and F. Scozzari. A logical model for relational abstract domains. ACM Trans. Programming Languages and Systems, 20(5):1067–1109, 1998. 12. K. Glynn, P. J. Stuckey, M. Sulzmann, and H. Søndergaard. Exception analysis for non-strict languages. In Proc. 2002 ACM SIGPLAN Int. Conf. Functional Programming, pages 98–109. ACM Press, 2002. 13. P. R. Halmos. Lectures on Boolean Algebras. Springer-Verlag, 1963. 14. A. Heaton, M. Abo-Zaed, M. Codish, and A. King. A simple polynomial groundness analysis for logic programs. J. Logic Programming, 45(1-3):143–156, 2000. 15. J. M. Howe and A. King. Efficient groundness analysis in Prolog. Theory and Practice of Logic Programming, 3(1):95–124, 2003. 16. J. M. Howe, A. King, and L. Lu. Analysing logic programs by reasoning backwards. In M. Bruynooghe and K.-K. Lau, editors, Program Development in Computational Logic, volume 3049 of LNCS, pages 152–188. Springer, 2004. 17. K. Marriott and H. Søndergaard. Precise and efficient groundness analysis for logic programs. ACM Lett. Programming Languages and Systems, 2(1–4):181–196, 1993. 18. A. Mycroft. Abstract Interpretation and Optimising Transformations for Applicative Programs. PhD thesis, University of Edinburgh, Scotland, 1981. 19. O. Ore. Combinations of closure relations. Ann. Math., 44(3):514–533, 1943. 20. E. L. Post. The Two-Valued Iterative Systems of Mathematical Logic. Princeton University Press, 1941. Reprinted in M. Davis, Solvability, Provability, Definability: The Collected Works of Emil L. Post, pages 249–374, Birkhaüser, 1994. 21. S. Rudeanu. Boolean Functions and Equations. North-Holland, 1974. 22. T. J. Schaefer. The complexity of satisfiability problems. In Proc. Tenth Ann. ACM Symp. Theory of Computing, pages 216–226, 1978. 23. M. Ward. The closure operators of a lattice. Ann. Math., 43(2):191–196, 1942. A CLP Method for Compositional and Intermittent Predicate Abstraction Joxan Jaffar, Andrew E. Santosa, and Răzvan Voicu School of Computing, National University of Singapore, S16, 3 Science Drive 2, Singapore 117543, Republic of Singapore {joxan, andrews, razvan}@comp.nus.edu.sg Abstract. We present an implementation of symbolic reachability analysis with the features of compositionality, and intermittent abstraction, in the sense of pefrorming approximation only at selected program points, if at all. The key advantages of compositionality are well known, while those of intermittent abstraction are that the abstract domain required to ensure convergence of the algorithm can be minimized, and that the cost of performing abstractions, now being intermittent, is reduced. We start by formulating the problem in CLP, and first obtain compositionality. We then address two key efficiency challenges. The first is that reasoning is required about the strongest-postcondition operator associated with an arbitrarily long program fragment. This essentially means dealing with constraints over an unbounded number of variables describing the states between the start and end of the program fragment at hand. This is addressed by using the variable elimination or projection mechanism that is implicit in CLP systems. The second challenge is termination, that is, to determine which subgoals are redundant. We address this by a novel formulation of memoization called coinductive tabling. We finally evaluate the method experimentally. At one extreme, where abstraction is performed at every step, we compare against a model checker. At the other extreme, where no abstraction is performed, we compare against a program verifier. Of course, our method provides for the middle ground, with a flexible combination of abstraction and Hoare-style reasoning with predicate transformers and loop-invariants. 1 Introduction Predicate abstraction [15] is a successful method of abstract interpretation. The abstract domain, constructed from a given finite set of predicates over program variables, is intuitive and easily, though not necessarily efficiently, computable within a traversal method of the program’s control flow structure. While it is generally straightforward to optimize the process of abstraction to a certain extent by performing abstraction at selected points only (eg. several consecutive asignments may be compressed and abstraction performed accross one composite assignment, as implemented in the B LAST system [19]), to this point there has not been a systematic way of doing this. Moreover, since the abstract description is limited to a fixed number of variables, such an ad-hoc method would not be compositional. For example, [2] requires an elaborate extension of predicate abstraction which essentially E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 17–32, 2006. c Springer-Verlag Berlin Heidelberg 2006  18 J. Jaffar, A.E. Santosa, and R. Voicu 0 i := 0 ; c := 0 1 while (i < n) do 2 i++ 3 c++ 4 c++ 5 end 6 even(0, i, n, c) → even(1, i1 , n1 , c1 ), i1 = 0, n1 = n, c1 = 0. even(1, i, n, c) → even(2, i1 , n1 , c1 ), i1 = i, n1 = n, i < n, c1 = c. even(2, i, n, c) → even(3, i1 , n1 , c1 ), i1 = i + 1, n1 = n, c1 = c. even(3, i, n, c) → even(4, i1 , n1 , c1 ), i1 = i + 1, n1 = n, c1 = c + 1. even(4, i, n, c) → even(5, i1 , n1 , c1 ), i1 = i + 1, n1 = n, c1 = c + 1. even(5, i, n, c) → even(2, i1 , n1 , c1 ), i1 = i, n1 = n, c1 = c, i < n. even(5, i, n, c) → even(6, i1 , n1 , c1 ), i1 = i, n1 = n, c1 = c, i ≥ n. (a) (b) Fig. 1. Even counts considers a second set of variables (called “symbolic constants”), in order to describe the behaviour of a function, in the language of predicate abstraction. This provides only a limited form of compositionality. In this paper, we present a way of engineering a general proof method of program reasoning based on predicate abstraction in which the process of abstraction is intermittent, that is, approximation is performed only at selected program points, if at all. There is no restriction of when abstraction is performed, even though termination issues will usually restrict the choices. The key advantages are that (a) the abstract domain required to ensure convergence of the algorithm can be minimized, and (b) the cost of performing abstractions, now being intermittent, is reduced. For example, to reason that x = 2 after executing x := 0; x++; x++, one needs to know that x = 1 holds before the final assignment. Thus, in a predicate abstraction setting, the abstract domain must contain the predicate x = 1 for the above reasoning to be possible. Also, consider proving x = 2n for the program snippet in Figure 1a. A textbook Hoare-style loop invariant for the loop is c = 2i. Having this formula in the abstract domain would, however, not suffice; one in fact needs to know that c = 2i − 1 holds in between the two increments to c. Thus in general, a proper loop invariant is useful only if we could propagate its information throughout the program exactly. A main challenge with exact propagation is that reasoning will be required about the strongest-postcondition operator associated with an arbitrarily long program fragment. This essentially means dealing with constraints over an unbounded number of variables describing the states between the start and end of the program fragment at hand. The advantages in terms of efficiency, however, are significant: less predicates needed in the abstract domain, and also, less frequent execution of the abstraction operation. Alternatively, it may be argued that using the weakest precondition operator for exact propagation may result in a set of constraints over a constant number of variables, and thus circumvent the challenge mentioned above. To see that this is not true, let us consider the following program fragment: while(x%7!=0)x++ ; while(x%11!=0)x++. Also, let us assume that we have an exact propagation algorithm, based on either the weakest preconditon or the strongest postcondition propagation operator, which computes a constraint that reflects the relationship between the values of x before and after the execution of the program fragment. Our algorithm needs to record the fact that between the two while loops the value of x is a multiple of 7. This cannot be done without introducing an auxilliary variable in the set of constraints. Assume now that this program A CLP Method for Compositional and Intermittent Predicate Abstraction 19 fragment appears in the body of another loop. Since that (outer) loop may be traversed multiple times in the analysis process, and every traversal of the loop will introduce a new auxilliary variable, the number of auxilliary variables is potentially unbounded, irrespective of the propagation operator that is used. An important feature of our proof method is that it is compositional. We represent a proof as a Hoare-style triple which, for a given program fragment, relates the input values of the variables to the output values. This is represented as a formula, and in general, such a formula must contain auxiliary variables in addition to the program variables. This is because it is generally impossible to represent the projection of a formula using a predefined set of variables, or equivalently, it is not possible to perform quantifier elimination. Consequently, in order to have unrestricted composition of such proofs, it is (again) necessary to deal with an unbounded number of variables. The paper is organized as follows. We start by formulating the problem in CLP, and first obtain compositionality. We then address two key efficiency challenges. The first is that reasoning is required about the strongest-postcondition operator associated with an arbitrarily long program fragment. This means dealing with constraints over an unbounded number of variables describing the states between the start and end of the program fragment at hand. We address this problem by using the variable elimination or projection mechanism that is implicit in CLP systems. The second challenge is termination, which translates into determining the redundancy of subgoals. We address this by a novel formulation of memoization called coinductive tabling. We finally evaluate the method experimentally. At one extreme, where abstraction is performed at every step, we compare against the model checker BLAST [19]. Here we employ a standard realization of intermittence by abstracting at prespecified points, and thus our algorithm becomes automatic. At the other extreme, where no abstraction is performed (but where invariants are used to deal with loops), we compare against the program-verifier ESC/Java [6]. Of course, our method provides for the middle ground, with a flexible combination of abstraction and Hoare-style reasoning with predicate transformers and loop-invariants. In summary, we present a CLP-based proof method which has the properties of being compositional, and which employs intermittent abstraction. The major technical contributions, toward this goal, are: the CLP formulation of the proof obligation, which provides expressiveness, and compositionality; a coinduction principle, which provides the basic mechanism for termination; and engineering the use of the underlying CLP projection mechanism in the process of exact propagation. Our method thus provides a flexible combination of abstraction and Hoare-style reasoning with predicate transformers and loop-invariants, that is compositional, and its practical implementation is feasible. 1.1 Related Work An important category of tools that use program verification technology have been developed within the framework of the Java Modelling Language (JML) project. JML allows one to specify a Java method’s pre- and post-conditions, and class invariants. Examples of such program verification tools are: Jack [11], ESC/Java2 [6], and Krakatoa [24]. All these tools employ weakest precondition/strongest postcondition 20 J. Jaffar, A.E. Santosa, and R. Voicu calculi to generate proof obligations which reflect whether the given post-conditions and class invariants hold at the end of a method, whenever the corresponding preconditions are valid at the procedure’s entry point. The resulting proof obligations are subsequently discharged by theorem provers such as Simplify [6], Coq [3], PVS [27], or HOL light [18]. While these systems perform exact propagation, they depend on user-provided loop invariants, as opposed to an abstract domain. Cousot and Cousot [7] have recognized a long time ago that coarse-grained abstractions are better than fine-grained ones. Moreover, recently there have emerged systems based on abstract interpretation, and in particular, on predicate abstraction. Some examples are B LAST [19], SLAM [1], MAGIC [5], and Murphi– – [8], amongst others. While abstract interpretation is central, these systems employ a further technique of automatically determining the abstract domain needed for a given assertion. This technique iteratively refines the abstract domain based on information derived from previous counterexamples. These systems do not perform exact propagation in a systematic way. The use of CLP for program reasoning is not new (see for example [14] for a nonexhaustive survey). Due to its capability for handling constraints, CLP has been notably used in verification of infinite-state systems [9, 10, 13, 17, 23], although results for finite-state systems are also available [26, 12]. Indeed, it is generally straightforward to represent program transitions as CLP rules, and to use the CLP operational model to prove assertions stated as CLP goals. What is novel in our CLP formulation is firstly, the compositional assertion, and then, coinductive tabling. More importantly, our formulation considers CLP programs, assertions and tabling in full generality. 2 Preliminaries Apart from a program counter k, whose values are program points, let there be n system variables ṽ = v1 , · · · , vn with domains D 1 , · · · , D n respectively. In this paper, we shall use just two example domains, that of integers, and that of integer arrays. We assume the number of system variables is larger than the number of variables required by any program fragment or procedure. Definition 1 (States and Transitions). A system state (or simply state) is of the form (k, d1 , · · · , dn ) where k is a program point and di ∈ D i , 1 ≤ i ≤ n, are values for the system variables. A transition is a pair of states. In what follows, we define a language of first-order formulas. Let V denote an infinite set of variables, each of which has a type in D 1 , · · · , D n , let Σ denote a set of functors, and Π denote a set of constraint symbols. A term is either a constant (0-ary functor) in Σ or of the form f (t1 , · · · ,tm ), m ≥ 1, where f ∈ Σ and each ti is a term, 1 ≤ i ≤ m. A primitive constraint is of the form φ(t1 , · · · ,tm ) where φ is a m − ary constraint symbol and each ti is a term, 1 ≤ i ≤ m. A constraint is constructed from primitive constraints using logical connectives in the usual manner. Where Ψ is a constraint, we write Ψ(X̃) to denote that Ψ possibly ˜ X̃) to denote the existential closure of Ψ(X̃) refers to variables in X̃, and we write ∃Ψ( over variables distinct from those in X̃. A substitution is a mapping which simultaneously replaces each variable in a term or constraint by some expression. Where e is a term or constraint, we write eθ to denote A CLP Method for Compositional and Intermittent Predicate Abstraction 21 the result of applying θ to e. A renaming maps each variable in a given sequence, say X̃, into the corresponding variable in another given sequence, say Ỹ . We write [X̃ → Ỹ ] to denote such a mapping. A grounding substitution, or simply grounding maps each variable of an expression into a ground term representing a value in its respective domain. We denote by [[e]] the set of all possible groundings of e. 3 Constraint Transition Systems A key concept is that a program fragment P operates on a sequence of anonymous variables, each corresponding to a system variable at various points in the computation of P. In particular, we consider two sequences x̃ = x1 , · · · , xn and x̃t = xt1 , · · · , xtn of anonymous variables to denote the system values before executing P and at the “target” point(s) of P, respectively. Typically, but not always, the target point is the terminal point of P. Our proof obligation or assertion is then of the form {Ψ(x̃)} P {Ψ1 (x̃, x̃t )} where Ψ and Ψ1 are constraints over the said variables, and possibly including new variables. Like the Hoare-triple, this states that if P is executed in a state satisfying Ψ, then all states at the target points (if any) satisfy Ψ1 . Note that, unlike the Hoaretriple, P may be nonterminating and Ψ1 may refer to the states of a point that is reached infinitely often. We will formalize all this below. For example, let there be just one system variable x, let P be <0> x := x + 1 <1>, and let the target point be <1>. Then {true}P{xt = x + 1} holds, meaning P is the successor function on x. Similarly, if P were the (perpetual) program <0> while (true) x := x + 2 <1> endwhile <2>, and if <1> were the target point, then {true}P{xt = x + 2z} holds, that is, any state (1, x) at point <1> satisfies ∃z(xt = x + 2z). This shows, amongst other things, that the parity of x always remains unchanged. Our proof method accomodates concurrent programs of a fixed number of processes. Where we have n processes, we shall use as a program point, a sequence of n program points so that the ith program point is one which comes from the ith process, 1 ≤ i ≤ n. We next represent the program fragment P as a transition system which can be executed symbolically. The following key definition serves two main purposes. First, it is a high level representation of the operational semantics of P, and in fact, it represents its exact trace semantics. Second, it is an executable specification against which an assertion can be checked. Definition 2 (Constraint Transition System). A constraint transition of P is a formula p(k, x̃) → p(k1 , x˜1 ), Ψ(x̃, x˜1 ) where k and k1 are variables over program points, each of x̃ and x˜1 is a sequence of variables representing a system state, and Ψ is a constraint over x̃ and x˜1 , and possibly some additional auxiliary variables. A constraint transition system (CTS) of P is a finite set of constraint transitions of P. The symbol p is called the CTS predicate of P. In what follows, unless otherwise stated, we shall consistently denote by P the program of interest, and by p its CTS predicate. 22 J. Jaffar, A.E. Santosa, and R. Voicu Process 1: while (true) do 0 x := y + 1 1 await (x<y ∨ y=0) 2 x := 0 end Process 2: while (true) do 0 y := x + 1 1 await (y<x ∨ x=0) 2 y := 0 end Fig. 2. Two Process Bakery bak(0, p2, x, y) → bak(1, p2, x1 , y), x1 = y + 1. bak(1, p2, x, y) → bak(2, p2, x, y), x < y ∨ y = 0. bak(2, p2, x, y) → bak(0, p2, x1 , y), x1 = 0. bak(p1, 0, x, y) → bak(p1, 1, x, y1 ), y1 = x + 1. bak(p1, 1, x, y) → bak(p1, 2, x, y), y < x ∨ x = 0. bak(p1, 2, x, y) → bak(p1, 0, x, y1 ), y1 = 0. Fig. 3. CTS of Two Process Bakery Consider for example the program in Figure 1a; call it Even. Figure 1b shows a CTS for Even, whose CTS predicate is even. Consider another example: the Bakery algorithm with two processes in Figure 2. A CTS for this program, call it Bak, is given in Figure 3. Note that we use the first and second arguments of the term bak to denote the program points of the first and second process respectively. Clearly the variables in a constraint transition may be renamed freely because their scope is local to the transition. We thus say that a constraint transition is a variant of another if one is identical to the other when a renaming subsitution is performed. Further, we may simplify a constraint transition by renaming any one of its variables x by an expression y provided that x = y in all groundings of the constraint transition. For example, we may simply state the last constraint transition in Figure 3 into bak(p1, 2, x, y) → bak(p1, 0, x, 0) by replacing the variable y1 in the original transition with 0. The above formulation of program transitions is familiar in the literature for the purpose of defining a set of transitions. What is new, however, is how we use a CTS to define symbolic transition sequences, and thereon, the notion of a proof. By similarity with logic programming, we use the term goal to denote a formula that can be subjected to an unfolding process in order to infer a logical consequence. Definition 3 (Goal). A query or goal of a CTS is of the form p(k, x̃), Ψ(x̃) , where k is a program point, x̃ is a sequence of variables over system states, and Ψ is a constraint over some or all of the variables x̃, and possibly some additional variables. The variables x̃ are called the primary variables of this goal, while any additional variable in Ψ is called an auxiliary variable of the goal. Thus a goal is just like the conclusion of a constraint transition. We say the goal is a start goal if k is the start program point. Similarly, a goal is a target goal if k is the target program point. Running a start goal is tantamount to asking the question: which A CLP Method for Compositional and Intermittent Predicate Abstraction 23 bak(0, 0, x, y), x = 0, y = 0 bak(1, 0, x1 , y), x1 = 1, y = 0 bak(2, 0, x1 , y), x1 = 1, y = 0 bak(0, 0, x2 , y), x2 = 0, y = 0 bak(0, 1, x, y1 ), x = 0, y1 = 1 bak(1, 1, x1 , y1 ), x1 = 1, y1 = 2 bak(2, 1, x1 , y1 ), x1 = 1, x1 = 2 bak(2, 1, x1 , y1 ), x1 = 1, x1 = 2 bak(1, 1, x1 , y1 ), x1 = 2, y1 = 1 bak(1, 2, x1 , y1 ), x1 = 2, y1 = 1 bak(0, 2, x, y1 ), x = 0, y1 = 1 bak(1, 2, x1 , y1 ), x1 = 2, y1 = 1 bak(0, 0, x, y2 ), x = 0, y2 = 0 Fig. 4. Proof Tree of 2-Process Bakery Algorithm (Partially Shown) ˜ x̃) will lead to a goal at the target point(s)? The idea is that values of x̃ which satisfy ∃Ψ( we successively reduce one goal to another until the resulting goal is at a target point, and then inspect the results. Next we define the meaning of proving a goal against a CTS. Definition 4 (Proof Step, Sequence and Tree). Let there be a CTS for p, and let G = p(k, x̃), Ψ be a goal for this. A proof step from G is obtained via a variant p(k, ỹ) → p(k1 , y˜1 ), Ψ1 of a transition in the CTS in which all the variables are fresh. The result is a goal of the form p(k1 , y˜1 ), Ψ, x̃ = ỹ, Ψ1 , providing that the constraints Ψ, x̃ = ỹ, Ψ1 are satisfiable. A proof sequence is a finite or infinite sequence of proof steps. A proof tree is defined from proof sequences in the obvious way. A tree is complete if every internal node representing a goal G is succeeded by nodes representing every goal obtainable in a proof step from G . Consider again the CTS in Figure 1b, and we wish to prove {n = 1}p{c = 2}. There is in fact only one proof sequence from the start goal even(0, i, n, c), n = 1, c = 0. or equivalently, even(0, i, 1, 0). This proof sequence is shown in Figure 5, and note that the counter, represented in the last goal by the variable c2 , has the value 2. Definition 5 (Assertion). Let p be a program with start variables x̃, and let Ψ be a constraint. Let x̃t denote a sequence of variables representing system states not appeareven(0, i, n, c), n = 1, c = 0 even(4, i2 , n, c1 ), i2 = 1, n = 1, c1 = 1 even(1, i1 , n, c), i1 = 0, n = 1, c = 0 even(5, i2 , n, c2 ), i2 = 1, n = 1, c2 = 2 even(2, i1 , n, c), i1 = 0, n = 1, c = 0 even(6, i2 , n, c2 ), i2 = 1, n = 1, c2 = 2 even(3, i2 , n, c), i2 = 1, n = 1, c = 0 Fig. 5. Proof Tree of Even Counts Program 24 J. Jaffar, A.E. Santosa, and R. Voicu ing in p or Ψ. (These represent the target values of the system variables.) An assertion for p wrt to x̃t is of the form p(k, x̃), Ψ |= Ψ1 (x̃, x̃t ). In particular, when k is the start program point, we may abberviate the assertion using the notation {Ψ}p{Ψ1}. It is intuitively clear what it means for an assertion to hold. That is, execution from every instance θ of p(k, x̃), Ψ cannot lead to a target state where the property Ψ1 (x̃θ, x̃t ) is violated. In the example above, we could prove the assertion even(0, i, n, c) |= ct = 2n where it is understood that the final variable ct corresponds to the start variable c. Note that the last occurrence of n in the assertion means that we are comparing ct with the initial and not final value of n (though in this example, the two are in fact the same). We now state the essential property of proof sequences: Theorem 1. Let a CTS for p have the start point k and target point kt , and let x̃ and x˜1 each be sequences of variables over system states. The assertion {Ψ(x̃)} p {Ψ1 (x̃t , x̃)} holds if for any goal of the form p(kt , x˜1 ), Ψ2 (x˜1 , x̃) appearing in a proof sequence from ˜ 2 (x˜1 , x̃) |= ∃Ψ ˜ 1 (x˜1 , x̃) the goal p(k, x̃), Ψ(x̃), the following holds: ∃Ψ The above theorem provides the basis of a search method, and what remains is to provide a means to ensure termination of the search. Toward this end, we next define the concepts of subsumption and coinduction and which allow the (successful) termination of proof sequences. However, these are generally insufficient. In the next section, we present our version of abstraction whose purpose is to transform a proof sequence so that it is applicable to the termination criteria of subsumption and coinduction. 3.1 Subsumption Consider a finite and complete proof tree from some start goal. A goal G in the tree is subsumed if there is a different path in the tree containing a goal G such that [[G ]] ⊆ [[G ]]. The principle here is simply memoization: one may terminate the expansion of a proof sequence while constructing a proof tree when encountering a subsumed goal. 3.2 Coinduction The principle here is that, within one proof sequence, the proof obligation associated with the final goal may assume that the proof obligation of an ancestor goal has already been met. This can be formally explained as a principle of coinduction (see eg: Appendix B of [25]). Importantly, this simple form of coinduction does not require a base case nor a well-founded ordering. We shall simply demonstrate this principle by example. Suppose we had the transition p(0, x) → p(0, x ), x = x + 2 and we wished to prove the assertion p(0, x) |= even(xt − x), that is, the difference between x and its final value is even. Consider the derivation step: p(0, x) |= even(xt − x) p(0, x ), x = x + 2 |= even(xt − x) We may use, in the latter goal, the fact that the earlier goal satisfies the assertion. That is, we may reduce the obligaton of the latter goal to even(xt − x ), x = x + 2 |= even(xt − x). A CLP Method for Compositional and Intermittent Predicate Abstraction 25 It is now a simple matter of inferring whether this formula holds. In general practice, the application of coinduction testing is largely equivalent to testing if one goal is simply an instance of another. 4 Abstraction In the literature on predicate abstraction, the abstract description is a specialized data structure, and the abstraction operation serves to propagate such a structure though a small program fragment (a contiguous group of assignments, or a test), and then obtaining another structure. The strength of this method is in the simplicity of using a finite set of predicates over the fixed number of program variables as a basis for the abstract description. We choose to follow this method. However, our abstract description shall not be a distinguished data structure. In fact, our abstract description of a goal is itself a goal. Definition 6 (Abstraction). An abstraction A is applied to a goal. It is specified by a program point pc(A ), a sequence of variables var(A ) corresponding to a subset of the system variables, and finally, a finite set of constraints pred(A ) over var(A ), called the “predicates” of A . Let A be an abstraction and G be a goal p(k, x̃), Ψ where k = pc(A ). Let x˜1 denote the subsequence of x̃ corresponding to the system variables var(A ). Let x̄ denote the remaining subsequence of x̃. Without losing generality, we assume that x˜1 is an initial subsequence of x̃, that is, x̃ = x˜1 , x̄. Then the abstraction A (G) of G by A is p(k, Z̃, x̄), Ψ, Ψ2 [var(A ) → Z̃], where Z̃ is a sequence of fresh variables renaming x˜1 , and Ψ2 is the finite set of constraints {ψ2 ∈ pred(A ) : Ψ |= ψ2 [var(A ) → x˜1 ]} For example, let A be such that pc(A ) = 0, var(A ) = {v1 } and pred(A ) = {v1 < 0, v1 ≥ 0}. That is, the first variable is to be abstracted into a negative or a nonnegative value. Let G be p(0, [x1 , x2 , x3 ]), x1 = x2 , x2 = 1. Then the abstraction A (G) is a goal of the form p(0, [Z, x2 , x3 ]), x1 = x2 , x2 = 1, Z ≥ 0, which can be simplified into p(0, [Z, x2 , x3 ]), x2 = 1, Z ≥ 0. Note that the orginal goal had ground instances bub(0, i, j,t, n), n ≥ 0 bub(1, i1 , j,t1 , n), i1 = 0,t1 = 0, n ≥ 0 bub(8, i1 , j,t1 , n), i1 = 0,t1 = 0, 0 ≤ n ≤ 1 (Satisfies t1 = (n2 − n)/2) bub(2, i1 , j,t1 , n), i1 = 0,t1 = 0, n > 1 (A) bub(2, i2 , j,t2 , n), i2 < n − 1,t2 = n × i2 − (i22 − i2 )/2 (Intermittent abstraction) bub(6, i2 , j1 ,t3 , n), i2 < n − 1,t3 = n × (i2 + 1) − ((i2 + 1)2 − i2 − 1)/2 (Proof Composition) bub(7, i3 , j1 ,t3 , n), i3 < n,t3 = n × i3 − (i23 − i3 )/2 bub(8, i3 , j1 ,t3 , n), i3 = n − 1,t3 = n × i3 − (i23 − i3 )/2 (Satisfies t3 = (n2 − n)/2) bub(2, i3 , j1 ,t3 , n), i3 < n − 1,t3 = n × i3 − (i23 − i3 )/2 (Coinduction using (A)) Fig. 6. Compositional Proof 26 J. Jaffar, A.E. Santosa, and R. Voicu 0 t := 0; i := 0 1 while (i < n-1) do 2 j := 0 3 while ( j < n-i-1) do 4 j := j+1; t := t+1 5 end 6 i := i+1 7 end 8 (a) bub(0, i, j,t, n) → bub(1, i1 , j,t1 , n), i1 = 0,t1 = 0. bub(1, i, j,t, n) → bub(8, i, j,t, n), i ≥ n − 1. bub(1, i, j,t, n) → bub(2, i, j,t, n), i < n − 1. bub(2, i, j,t, n) → bub(3, i, j1 ,t, n), j1 = 0. bub(3, i, j,t, n) → bub(6, i, j,t, n), j ≥ n − i − 1. bub(3, i, j,t, n) → bub(4, i, j,t, n), j < n − i − 1. bub(4, i, j,t, n) → bub(5, i, j1 ,t1 , n), j1 = j + 1,t1 = t + 1. bub(5, i, j,t, n) → bub(6, i, j,t, n), j ≥ n − i − 1. bub(5, i, j,t, n) → bub(4, i, j,t, n), j < n − i − 1. bub(6, i, j,t, n) → bub(7, i1 , j,t1 , n), i1 = i + 1. bub(7, i, j,t, n) → bub(8, i, j,t, n), i ≥ n − 1. bub(7, i, j,t, n) → bub(2, i, j,t, n), i < n − 1. (b) Fig. 7. Program “Bubble” p(0, [1, 1, n]) for all n, while the abstracted goal has the instances p(0, [m, 1, n]) for all n and all nonnegative m. Note that the second variable x2 has not been abstracted even though it is tightly constrained to the first variable x1 . Note further that the value of x3 is unchanged, that is, the abstraction would allow any constraint on x3 , had the example goal contained such a constraint, to be propagated. Lemma 1. Let A be an abstraction and G a goal. Then [[G]] ⊆ [[A (G)]]. The critical point is that the abstraction of a goal has the same format as the goal itself. Thus an abstract goal has the expressive power of a regular goal, while yet containing a notion of abstraction that is sufficient to produce a finite-state effect. Once again, this is facilitated by the ability to reason about an unbounded number of variables. Consider the “Bubble” program and its CTS in Figures 7(a) and 7(b), which is a simplified skeleton of the bubble sort algorithm (without arrays). Consider the subprogram corresponding to start point 2 and whose target point is 6, that is, we are considering the inner loop. Further suppose that the following assertion had already been proven: bub(2, i, j,t, n) |= it = i,t t = t + n − i − 1, nt = n that is, the subprogram increments t by n − i − 1 while preserving both i and n, but not j. Consider now a proof sequence for the goal bub(0, i, j,t, n), n ≥ 0, where we want to prove that at program point 8, t = (n2 − n)/2. The proof tree is depicted in Figure 6. The proof shows a combination of the use of intermittent abstraction and compositional proof: • At point (A), we abstract the goal bub(2, i1 , j,t1 , n), i1 = 0,t1 = 0, n > 1 using the predicates i < n − 1 and t = n × i − (i2 − i)/2. Call this abstraction A . Here the set of variables is var(A ) = {i,t}, hence both the variables i1 and t1 that correspond respectively to system variables i and t are renamed to fresh variables i2 , and t2 . Meanwhile, the variables j and n retain their original values. • After performing the above abstraction, we reuse the proof of the inner loop above. Here we immediately move to program point 6, incrementing t with n − i − 1, and updating j to an unknown value. However, i and n retain their original values at 2. • The result of the intermittent abstraction above is a coinductive proof. A CLP Method for Compositional and Intermittent Predicate Abstraction 27 5 The Whole Algorithm We now summarize our proof method for an assertion {Ψ}p{Ψ1 } Suppose the start program point of p is k and the start variables of p are x̃. Then consider the start goal p(k, x̃), Ψ and incrementally build a complete proof tree. For each path in the tree constructed so far leading to a goal G if: • G is either subsumed or is coinductive, then consider this path closed, ie: not to be expanded further; • G is a goal on which an abstraction A is defined, replace G by A (G ); • G is a target goal, and if the constraints on the primary variables x˜1 in G do not satisfy Ψθ, where θ renames the target variables in Ψ into x˜1 , terminate and return false. • the expansion of the proof tree is no longer possible, terminate and return true. Theorem 2. If the above algorithm, applied to the assertion {Ψ}p{Ψ1}, terminates and does not return false, then the assertion holds. 6 CLP Technology It is almost immediate that CTS is implementable in CLP. Given a CTS for p, we build a CLP program in the following way: (a) for every transition of the form (k, x̃) → (k , x˜ ), Ψ we use the CLP rule the clause p(k, x̃) : −p(k , x̃ ), Ψ (assuming that Ψ is in the constraint domain of the CLP implementation at hand); (b) for every terminal program point k, we use the CLP fact p(k, , . . . , , ), where the number of anonymous variables is the same as the number of variables in x̃. We see later that the key implementation challenge for a CLP system is the incremental satisfiability problem. Roughly stated, this is the problem of successively determining that a monotonically increasing sequence of constraints (interpreted as a conjunction) is satisfiable. 6.1 Exact Propagation is “CLP-Hard” Here we informally demonstrate that the incremental satisfiability problem is reducible to the problem of analyzing a straight line path in a program. We will consider here constraints in the form of linear diophantine equations, i.e., multivariate polynomials over the integers. Without loss of generality, we assume each constraint is written in the form X = Y + Z or X = nY where n is an integer. Throughout this section, we denote by X, Y , Z logic variables, and by x, y, z their corresponding program variables, respectively. Suppose we already have a sequence of constraints Ψ0 , · · · , Ψi and a corresponding path in the program’s control flow. Suppose we add a new constraint Ψi+1 = (X = Y + Z). Then, if one of these variables, say Y , is new, we add the assignment y := x − z where y is a new variable created to correspond to Y . The remaining variables x and z are each either new, or are the corresponding variables to X and Z, respectively. If however all of X, Y and Z are not new, 28 J. Jaffar, A.E. Santosa, and R. Voicu then add the statement if (x = y + z) ... . Hereafter we pursue the then branch of this if statement. Similarly, suppose the new constraint were of the form X = nY . Again, if x is new, we simply add the assignment x := n ∗ y where x is newly created to correspond to X. Otherwise, add the statement if (x = n * y) ... to the path, and again, we now pursue the then branch of this if statement. Clearly an exact analysis of the path we have constructed leading to a successful traversal required, incrementally, the solving of the constraint sequence Ψ0 , · · · , Ψn . 6.2 Key Elements of CLP Systems A CLP system attempts to find answers to an initial goal G by searching for valid substitutions of its variables, in a depth-first manner. Each path in the search tree in fact involves the solving of an incremental satisfiability problem. Along the way, unsatisfiability of the constraints at hand would entail backtracking. The key issue in CLP is the incremental satisfiability problem, as mentioned above. A standard approach is as follows. Given that the sequence of constraints Ψ0 , . . . , Ψi has been determined to be satisfiable, represent this fact in a solved form. Essentially, this means that when a new constraint Ψi+1 is encountered, the solved form can be combined efficiently with Ψi+1 in order to determine the satisfiability of the new conjunction of constraints. This method essentially requires a representation of the projection of a set of constraints onto certain variables. Consider, for example, the set x0 = 0, x1 = x1 + 1, x2 = x1 + 1, · · · , xi = xi−1 + 1. Assuming that the new constraint would only involve the variable xi (and this happens vastly often), we desire a representation of xi = i. This projection problem is well studied in CLP systems [21]. In the system CLP(R ) [22] for example, various adaptations of the Fourier-Motzkin algorithm were implemented for projection in Herbrand and linear arithmetic constraints. We finally mention another important optimization in CLP: tail recursion. This technique uses the same space in the procedure call stack for recursive calls. Amongst other benefits, this technique allows for a potentially unbounded number of recursive calls. Tail recursion is particurly relevant in our context because the recursive calls arising from the CTS of programs are often tail-recursive. The CLP(R ) system that we use to implement our prototype has been engineered to handle constraints and auxiliary variables efficiently using the above techniques. 7 Experiments 7.1 Exact Runs We start with an experiment which shows that concrete execution can potentially be less costly than abstract execution. To that end, we compare the timing of concrete execution using our CLP-based implementation and a predicate abstraction-based model checker. We run a simple looping program, whose C code is shown in Figure 8 (a). First, we have B LAST generate all the 100 predicates it requires. We then re-run B LAST by providing these predicates. B LAST took 22.06 seconds to explore the state space. A CLP Method for Compositional and Intermittent Predicate Abstraction 29 int main() int main() { int i=0, j, x=0; { int i=0, j, x=0; while (i<7) { while (i<50) { j=0; i++; j=0; while (j<7) { x++; j++; } while (j<10) { x++; j++; } i++; } while (x>i) { x--; }} if (x>49) { ERROR: }} if (x<50) { ERROR: }} (a) (b) Fig. 8. Programs with Loop On the same machine, and without any abstraction, our verification engine took only 0.02 seconds. For comparison, S PIN model checker [20] executes the same program written in P ROMELA in less than 0.01 seconds. Note that for all our experiments, we use a Pentium 4 2.8 GHz system with 512 MB RAM running GNU/Linux 2.4.22. Next, consider the synthetic program consisting of an initial assignment x := 0 followed by 1000 increments to x, with the objective of proving that x = 1000 at the end. Consider also an alternative version where the program contains only a single loop which increments its counter x 1000 times. We input these two programs to our program verifier, without using abstraction, and to ESC/Java 2 as well. The results are shown in Table 1. For both our verifier and ESC/Java 2 we run both with x initialized to 0 and not initialized, hopefully forcing symbolic execution. Table 1 shows that our verifier runs faster for the non-looping version. However, there is a noticeable slowdown in the looping version for our implementation. This is caused by the fact that in our implementation of coinductive tabling, subsumption check is done based on similarity of program point. Therefore, when a program point inside a loop is visited for the i-th time, Table 1. Timing Comparison with ESC/Java there are i − 1 subsumption checks to be performed. This results in Time (in Seconds) a total of about 500,000 subsumpCLP with Tabling ESC/Java 2 tion checks for the looping program. x==0 — x==0 — In comparison, the non-looping verNon-Looping 2.45 2.47 9.89 9.68 sion requires only 1,000 subsumpLooping 22.05 21.95 1.00 1.00 tion checks. However, our implementation is currently at a prototype stage and our tabling mechanism is not implemented in the most efficient way. For the looping version, ESC/Java 2 employs a weakest precondition propagation calculus; since the program is very small, with a straightforward invariant (just the loop condition), the computation is very fast. Table 1 also shows that there is almost no difference between having x initialized to 0 or not. 7.2 Experiments Using Abstraction Next we show an example that demonstrates that the intermittent approach requires fewer predicates. Let us consider a second looping program written in C, shown in Figure 8 (b). The program’s postcondition can be proven by providing an invariant 30 J. Jaffar, A.E. Santosa, and R. Voicu x=i ∧ i<50 before the first statement of the loop body of the outer while loop. For predicate abstraction, we use the following predicates x=i, i<50, and respectively their negations x=i, i≥50 for that program point to our verifier. The proof process finishes in less than 0.01 seconds. If we do not provide an abstract domain, the verification process finishes in 20.34 seconds. Here intermittent predicate abstraction requires fewer predicates: We also run the same program with B LAST and provide the predicates x=i and i<50 (B LAST would automatically also consider their negations). B LAST finishes in 1.33 seconds, and in addition, it also produces 23 other predicates through refinements. Running it again with all these predicates given, B LAST finishes in 0.28 seconds. Further, we also tried our proof method on a version of the bakery mutual exclusion algorithm. We need abstraction since the bakery algorithm is an infinite-state program. The pseudocode for process i is shown in Figure 9. Here we would like to verify mutual exclusion, that is, no two processes are in the critical section (program point 2) at the same time. Our version of the bakery algorithm is a concurrent program with asynchronous composition of processes. Nondeterminism due while (true) do to concurrency can be encoded xi := max(x j=i ) + 1 0 using nondeterministic choice. 1 await (∀ j : j = i → xi <x j ∨ x j =0) We encode the algorithm for 2, 2 xi := 0 3 and 4 processes in B LAST, end where nondeterministic choice is implemented in using the speFig. 9. Bakery Algorithm Peudocode for cial variable BLAST NONDET which has a nondeterministic value. When N is the number of processes, each of the program has the N variables pci , where 1 ≤ i ≤ N, each denoting the program point of process i. pci can only take a value from {0, 1, 2}. and also N variables xi , each denoting the “ticket number” of a process. We also translate the B LAST code into CTS. In our experiments, we attempt to verify mutual exclusion property, that is, no two processes can be in the critical section at the same time. Here we perform 3 sets of runs, each consisting of runs with 2, 3 and 4 processes. In all 3 sets, we use a basic set of predicates: xi =0, xi ≥0, pci =0, pci =1, pci =2, where i = 1, . . . , N and N the number of processes, and also their negations. • Set 1: Use of predicate abstraction at every state with full predicate set. We perform abstraction at every state encountered during search. In addition to the basic predicates, we also require the predicates shown in Table 2 (a) (and their negations) to avoid spurious counterexamples. • Set 2: Intermittent predicate abstraction with full predicate set. We use intermittent abstraction on our prototype implementation. We abstract only when for some process i, pci =1 holds. The set of predicates is as in the first set. • Set 3: Intermittent predicate abstraction with reduced predicate set. We use intermittent abstraction on our tabled CLP system. Wee only abstract whenever there are N −1 processes at program point 0 (in the 2-process sequential version this means either pc1=0 or pc2=0). For a N-process bakery algorithm, we only need the basic predicates and their negations without the additional predicates shown in Table 2 (a). A CLP Method for Compositional and Intermittent Predicate Abstraction 31 Table 2. Results of Experiments Using Abstraction. (a) Additional Predicates. (b) Timing Constraints. Bakery-2 x1<x2 Bakery-3 x1<x2, x1<x3, x2<x3 Bakery-4 x1<x2, x1<x3, x1<x4 x2<x3, x2<x4, x3<x4 Time (in Seconds) CLP with Tabling B LAST Set 1 Set 2 Set 3 Bakery-2 0.02 0.01 <0.01 0.17 Bakery-3 0.83 0.14 0.09 2.38 Bakery-4 131.11 8.85 5.02 78.47 (a) (b) We have also compared our results with B LAST. We supplied the same set of predicates that we used in the first and second sets to B LAST. Again, in B LAST we do not have to specify their negations explicitly. Interestingly, for 4-process bakery algortihm B LAST requires even more predicates to avoid refinement, which are x1=x3+1, x2=x3+1, x1=x2+1, 1≤x4, x1≤x3, x2≤x3 and x1≤x2. We suspect this is due to the fact that precision in predicate abstraction-based state-space traversal depends on the power of the underlying theorem prover. We have B LAST generate these additional predicates it needs in a pre-run, and then run B LAST using them. Here since we do not run B LAST with refinement, as the lazy abstraction technique [19] has no effect, and B LAST uses all the supplied predicates to represent any abstract state. For these problems, using our intermittent abstraction with CLP tabling is also markedly faster than both full predicate abstraction with CLP and B LAST. We show our timing results in Table 2 (b) (smallest recorded time of 3 runs each). The first set and B LAST both run with abstraction at every visited state. The timing difference between them and second and third sets shows that performing abstraction at every visited state is expensive. The third set shows further gain over the second when we understand some intricacies of the system. Acknowledgement We thank Ranjit Jhala for help with BLAST. References 1. T. Ball, R. Majumdar, T. Millstein, and S. K. Rajamani. Automatic predicate abstraction of C programs. In 15th PLDI, pages 203–213. ACM Press, May 2001. SIGPLAN Notices 36(5). 2. T. Ball, T. Millstein, and S. K. Rajamani. Polymorphic predicate abstraction. ACM Transactions on Programming Languages and Systems, 27(2):314–343, 2005. 3. B. Barras, S. Boutin, C. Cornes, J. Courant, J. Filliatre, E. Giménez, H. Herbelin, G. Huet, C. M. Noz, C. Murthy, C. Parent, C. Paulin, A. Saı̈bi, and B. Werner. The Coq proof assistant reference manual—version v6.1. Technical Report 0203, INRIA, 1997. 4. A. Bossi, editor. LOPSTR ’99, volume 1817 of LNCS. Springer, 2000. 5. S. Chaki, E. Clarke, A. Groce, S. Jha, and H. Veith. Modular verification of software components in C. IEEE Transactions on Software Engineering, 30(6):388–402, June 2004. 32 J. Jaffar, A.E. Santosa, and R. Voicu 6. D. R. Cok and J. Kiniry. ESC/Java2: Uniting ESC/Java and JML. In G. Barthe, L. Burdy, M. Huisman, J.-L. Lanet, and T. Muntean, editors, CASSIS 2004, volume 3362 of LNCS, pages 108–128. Springer, 2005. 7. P. Cousot and R. Cousot. Comparing the Galois connection and widening/narrowing approaches to abstract interpretation. In M. Bruynooghe and M. Wirsing, editors, PLILP ’92,, LNCS 631. 8. S. Das, D. L. Dill, and S. Park. Experience with predicate abstraction. In N. Halbwachs and D. Peled, editors, 11th CAV, number 1633 in LNCS, pages 160–171. Springer, 1999. 9. G. Delzanno and A. Podelski. Model checking in CLP. In R. Cleaveland, editor, 5th TACAS, volume 1579 of LNCS, pages 223–239. Springer, 1999. 10. X. Du, C. R. Ramakrishnan, and S. A. Smolka. Tabled resolution + constraints: A recipe for model checking real-time systems. In 21st RTSS. IEEE Computer Society Press, 2000. 11. L. Burdy et. al. Java applet correctness: A developer-oriented approach. In K. Araki, S. Gnesi, and D. Mandrioli, editors, FME 2003, volume 2805 of LNCS. 12. Y. S. Ramakrishna et. al. Efficient model checking using tabled resolution. In Grumberg [16]. 13. F. Fioravanti, A. Pettorossi, and M. Proietti. Verifying CTL properties of infinite-state systems by specializing constraint logic programs. In M. Leuschel, A. Podelski, C. R. Ramakrishnan, and U. Ultes-Nitsche, editors, 2nd VCL, pages 85–96, 2001. 14. L. Fribourg. Constraint logic programming applied to model checking. In Bossi [4], pages 30–41. 15. S. Graf and H. Saı̈di. Construction of abstract state graphs of infinite systems with PVS. In Grumberg [16], pages 72–83. 16. O. Grumberg, editor. CAV ’97, Proceedings, volume 1254 of LNCS. Springer, 1997. 17. G. Gupta and E. Pontelli. A constraint-based approach for specification and verification of real-time systems. In 18th RTSS, pages 230–239. IEEE Computer Society Press, 1997. 18. J. Harrison. HOL light: A tutorial introduction. In M. K. Srivas and A. J. Camilleri, editors, 1st FMCAD, volume 1166 of LNCS, pages 265–269. Springer, 1996. 19. T. A. Henzinger, R. Jhala, and R. Majumdar. Lazy abstraction. In 29th POPL, pages 58–70. ACM Press, 2002. SIGPLAN Notices 37(1). 20. G. J. Holzmann. The S PIN Model Checker: Primer and Reference Manual. Add.-Wesley, 2003. 21. J. Jaffar, M. Maher, P. Stuckey, and R. Yap. Projecting CLP(R ) constraints. In New Generation Computing, volume 11, pages 449–469. Ohmsha and Springer-Verlag, 1993. 22. J. Jaffar, S. Michaylov, P. J. Stuckey, and R. H. C. Yap. The CLP(R ) language and system. ACM TOPLAS, 14(3):339–395, 1992. 23. M. Leuschel and T. Massart. Infinite-state model checking by abstract interpretation and program specialization. In Bossi [4]. 24. C. Marché, C. Paulin-Mohring, and X. Urbain. The KRAKATOA tool for certification of JAVA/JAVACARD programs annotated in JML. J. Log. and Alg. Prog., 58(1–2):89–106, 2004. 25. F. Nielson, H. R. Nielson, and C. Hankin. Principles of Program Analysis. Springer, 1999. 26. U. Nilsson and J. Lübcke. Constraint logic programming for local and symbolic model checking. In ed. J. W. Lloyd et. al., editor, 1st CL, volume 1861 of LNCS, pages 384–398. Springer, 2000. 27. S. Owre, N. Shankar, and J. Rushby. PVS: A prototype verification system. In D. Kapur, editor, 11th CADE, volume 607 of LNCS, pages 748–752. Springer, 1992. Combining Shape Analyses by Intersecting Abstractions Gilad Arnold1 , Roman Manevich2, , Mooly Sagiv2 , and Ran Shaham 1 University of California, Berkeley arnold@eecs.berkeley.edu 2 Tel Aviv University {rumster, msagiv}@tau.ac.il ran.shaham@gmail.com Abstract. We consider the problem of computing the intersection (meet) of heap abstractions.This problem is useful, among other applications, to relate abstract memory states computed by forward analysis with abstract memory states computed by backward analysis. Since dynamically allocated heap objects have no static names, relating objects computed by different analyses cannot be done directly. We show that the problem of computing meet is computationally hard. We describe a constructive formulation of meet based on certain relations between abstract heap objects. The problem of enumerating those relations is reduced to finding constrained matchings in graphs. We implemented the algorithm in the TVLA system and used it to prove temporal heap properties of several small Java programs, and obtained empirical evidence showing the effectiveness of the meet algorithm. 1 Introduction This research is motivated by the need to approximate temporal properties of programs manipulating dynamically allocated data structures. For example, statically identifying a point in the program after which a list element will never be accessed and thus can be deallocated. As it is undecidable, in general, to prove interesting properties about programs with dynamic memory allocation with pointers and destructive updates, the use of abstract interpretation [2] to compute an over-approximation of a program’s operational semantics is a fundamental practice underlying this work. Thus, while proving some correct program properties may fail, every proved property is assured to hold. We are interested in inferring persistent [6] temporal properties of heaps. These are properties that continuously hold from a given point in the trace. Inferring persistent temporal properties is naturally done in two phases, where the first phase over-approximates the shapes of the data structures using forward analysis starting at the entry node, and the second phase computes heap liveness using a backward analysis stating at the exit node. Notice that this generalizes  This research was supported in part by the Clore Fellowship Programme. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 33–48, 2006. c Springer-Verlag Berlin Heidelberg 2006  34 G. Arnold et al. the process of computing scalar liveness in compilers in which the first phase is unnecessary in the absence of pointers and arrays. We call this approach Phased Bidirectional Analysis. The problem of integrating the forward phase with the backward phase is challenging since the exact memory locations are lost by the abstraction. Therefore, this paper addresses the problem of computing the intersection of heap abstractions. When applied to a set of elements of some abstract domain (lattice), this operator—commonly referred to as meet —yields the greatest lower bound of the elements in the set. Specifically, for two heap abstractions, the corresponding meet is the set of common stores that are represented by both of its operands. The main contributions of this paper are summarized as follows: 1. We prove that meet is computationally hard for the abstract domain of bounded structures (Theorem 3), which is used by the TVLA system, by showing a reduction from the problem of 3-colorability on graphs to deciding whether the output of meet is empty. This result is a bit surprising since structures in this domain have unique “canonical names”, which makes isomorphism checking and checking of embedding (subsumption) decidable in polynomial time. 2. We present a new algorithm to compute the meet of 3-valued structures. We define the concept of correspondence relations between abstract heap objects and explain how to compute meet from these relations. We then develop a strategy to find correspondence relations that manages to prune many of the irrelevant relations thus making the algorithm efficient in practice. 3. We have implemented the meet algorithm in TVLA—a system for generating program analysis from operational semantics [5]—and used it to implement a new analysis for detecting program locations where heap objects and reference fields become unused in Java programs. The information discovered by the analysis can be used to improve memory management. The analysis combines forward and backward information and proves to be precise enough for several small but interesting programs operating on list data structures. The empirical results shows that our analysis is precise enough to reclaim memory as soon as it becomes unneeded. Therefore, our algorithm can serve as a reference algorithm for compile-time garbage collection. Our experiments indicate that the heuristics used by the meet algorithm make it very effective in combining shape analysis; the time and space performance of the algorithm is typically related to the size of the input and output by a linear factor. However, our current prototype implementation is slow and was only applied to small programs. Running Example. Fig. 1 shows a simple program in a Java-like language that prints the elements of a singly-linked list. This program serves as the running example in this paper. The goal of the analysis here is to discover the earliest points where reference variables and reference fields are no longer used. Specifically, we would like to find that: (a) reference variable x is never used after line 7 (this is rather trivial, since x does not appear later), and (b) that the reference Combining Shape Analyses by Intersecting Abstractions 35 [1] x = null; [2] while (...) { [3] y = new SLL(); [4] y.val = ...; [5] y.n = x; [6] x = y; } [7] y = x; // can insert "x = null;" here [8] while (y != null) { [9] System.out.print(y.val); [10] t = y.n; // can insert "free y;" or "y.n = null;" here [11] y = t; } Fig. 1. A program that creates a singly-linked list and traverses its elements field n of the object pointed-to by y is never used after line 10. The second fact is more challenging to prove, as the object pointed-to by y is different on every iteration of the loop. Outline. The rest of the paper is organized as follows. Section 2 gives an overview of program analysis of heap-manipulating programs using 3-valued logic. In Section 3 we explain how approximate temporal properties of heaps with meet. In Section 4, we present our algorithm for meet. Section 5 describes our experiments with an analyzer that infers compile-time garbage collection information in Java programs by using meet. Section 6 discusses related work. All proofs, as well as detailed examples, appear in [1]. 2 3-Valued Shape Analysis Overview In this section we explain the representation of concrete program states and their abstractions, based on the parametric analysis framework of [7]. 2.1 Concrete Program States We represent concrete program states by 2-valued logical structures. Definition 1. A 2-valued logical structure over a vocabulary (set of predicates) P is a pair S = U, ι where U is the universe of the 2-valued structure, and ι is the interpretation function mapping predicates to their truth-value in the structure: for every predicate p ∈ P of arity k, ι(p) : U k → {0, 1}. In this paper, we assume that the set of predicates includes the binary predicate eq, and insist that it is interpreted as equality between individuals. Table 1 shows the predicates used to record properties of individuals for the shape analysis of the running example (forward phase). We denote the set of all 2-valued logical structures over a set of predicates P by 2-STRUCT[P]. In the sequel, we assume that the vocabulary P is fixed, and abbreviate 2-STRUCT[P] to 2-STRUCT. Concrete states (2-valued logical structures) are depicted as directed graphs. Each individual of the universe is drawn as a node. A unary predicate p(u), 36 G. Arnold et al. Table 1. Predicates used for shape analysis of the running example, and their meaning. The set PVar stands for the set of reference variables {x,y,t}. Predicates eq(v1 , v2 ) {x(v) : x ∈ PVar} n(v1 , v2 ) {rx,n (v) : x ∈ PVar} is(v) cn (v) Intended Meaning Is v1 equal to v2 ? Does reference variable x point to object v? Does the n field of object v1 point to object v2 ? Is v reachable from reference variable x along n fields? Do two or more fields of heap elements point to v? Is v on a directed cycle of n fields? which holds for an individual u, appears next to the corresponding node. If a unary predicate represents a reference variable, it is shown by having an arrow drawn from its name to the node referenced by the variable. The binary predicate n(u1 , u2 ), which holds for a pair of individuals u1 and u2 , is drawn as a directed edge from u1 to u2 , and labeled n. The predicate eq is not drawn, since any two nodes are different and every node is equal to itself. y (a) n n t n n n n x rx,n rx,n rx,n rx,n ry,n rx,n ry,n rt,n y rx,n ry,n rt,n t n (b) n rx,n ry,n rt,n n n n n x rx,n rx,n rx,n ry,n rx,n ry,n rt,n rx,n ry,n rt,n Fig. 2. (a) A concrete program state arising after the execution of the statement t = y.n; (b) An abstract program state approximating the concrete state in (a) Fig. 2(a) shows a concrete program state arising after the execution of the statement t = y.n on line 10 of the running example in Fig. 1. 2.2 Abstract Program States The abstract program states we use are based on 3-valued logic [7], which extends boolean logic by introducing a third value 1/2, denoting values that may be either 0 or 1. In particular, we utilize the partially ordered set {0, 1, 1/2} where 0  1/2 and 1  1/2, with the join operation , defined by x  y = x if x = y, and x  y = 1/2 otherwise. Definition 2. A 3-valued logical structure over a set of predicates P is a pair S = (U, ι) where U is the universe of the 3-valued structure, and ι is the interpretation function mapping predicates to their truth-value in the structure: for every predicate p ∈ P of arity k, ι(p) : U k → {0, 1, 1/2}. Combining Shape Analyses by Intersecting Abstractions 37 An abstract state may include summary nodes, i.e., an individual which corresponds to one or more individuals in a concrete state represented by that abstract state. A summary node u has eq(u, u) = 1/2, indicating that it may represent more than a single individual. Abstract states (3-valued logical structures) are also depicted as directed graphs, where unary predicates denoting reference variables, as well as binary predicates, with 1/2 values are shown as dotted edges. Summary individuals appear as double-circled nodes. A unary predicate that evaluates to 1/2 for a node is depicted by having = 1/2 next to the name of the predicate. We denote the set of all 3-valued logical structures over a set of predicates P by 3-STRUCT[P], and usually abbreviate it to 3-STRUCT. We define a partial order on structures, denoted by , based on the concept of embedding. Definition 3 (Embedding). Let S = (U, ι) and S  = (U  , ι ) be two structures and let f : U → U  be a surjective function. We say that f embeds S in S  , denoted S f S  , if for every predicate p ∈ P (k) and k individuals u1 , . . . , uk ∈ U ,  pS (u1 , . . . , uk )  pS (f (u1 ), . . . , f (uk )) . (1) We say that S is embedded in S  , denoted S  S  , if there exists a function f such that S f S  . We also say that S  approximates S. The embedding order is used to define a concretization function for a single   S}. The concretization 3-valued structure S by σ(S) = {S  ∈ 2-STRUCT | S  of a set of 3-valued structures is defined by γ(XS) = S∈XS σ(S). The embedding order induces a Hoare preorder on sets of 3-valued structures. Definition 4. For sets of structures XS1 , XS2 ⊆ 3-STRUCT, XS1  XS2 if and only if ∀S1 ∈ XS1 : ∃S2 ∈ XS2 : S1  S2 . In the following definition, we restrict sets of 3-valued structures by disallowing non-maximal structures. This ensures that the Hoare ordering is a proper partial ordering on the sets. We are now ready to present the abstract domain which is considered for the construction of the meet algorithm. Definition 5 (Core Abstract Domain). The abstract domain D3-STRUCT consists of all sets of 3-valued structures that do not contain non-maximal structures, {XS ⊂ 3-STRUCT | ∀S1 , S2 ∈ XS : S1  S2 =⇒ S1 = S2 }, with the same ordering as in Definition 4. 2.3 Bounded Program States Note that the size of a 3-valued structure is potentially unbounded and that 3-STRUCT is infinite. The abstractions studied in [7], and also used for the analysis in Section 5, rely on a fundamental abstraction function for converting 38 G. Arnold et al. a potentially unbounded structure—either 2-valued or 3-valued—into a bounded 3-valued structure. A 3-valued structure is said to be bounded if for every two distinct individuals in its universe there exists a unary predicate p such that either pS1 (u1 ) = 0 and pS2 (u2 ) = 1 or pS1 (u1 ) = 1 and pS2 (u2 ) = 0.1 We denote the set of all bounded 3-valued structures over a set of predicates P by B-STRUCT[P]. The finite abstract domain DB-STRUCT is a sublattice of D3-STRUCT , containing all sets of bounded structures that do not contain non-maximal structures. P : 2-STRUCT[P] → B-STRUCT[P] converts The abstraction function βblur a (potentially unbounded) 2-valued structure into a bounded 3-valued structure, by merging all individuals with the same values for all unary predicates. P ((U, ι)) = (U  , ι ), where U  is the set of equivalence classes in U Namely, βblur of nodes with same values for all unary predicates, and the interpretation ι of each predicate p ∈ P (k) and k individuals c1 , . . . , ck ∈ U  is given by   pS (c1 , . . . , ck ) = pS (u1 , . . . , uk ) . ui ∈ci Fig. 2(b) shows a bounded structure obtained from the structure in Fig. 2(a). The abstraction function βblur , which is called canonical abstraction, serves as the basis for abstract interpretation in TVLA [5]. In particular, it serves as the basis for defining various different abstractions for the (potentially unbounded) set of 2-valued logical structures that may arise at a program point, by defining different sets of predicates. Wealso define the function α, which extends βblur to sets of structures: α(XS) = {βblur (S) | S ∈ XS}.2 3 Inferring Temporal Properties Via Staged Bidirectional Analysis Persistent temporal properties can be efficiently verified without explicitly representing traces. An example of such a property is liveness of reference variables and reference fields. A reference variable or reference field is said to be dead (i.e., not live) at a given program point if on every execution that goes through that point it is not used before being redefined. The (possibly infinite) set of temporal properties is defined as the least fixed point of the following (not necessarily computable) system of equations: − → CSentry − → CSl2 ← − CSexit ← − CSl1 1 2 = CSinit    − → → 2 , Sout ) = Sout  (l1 , l2 ) ∈ E, Sin ∈ CSl1 , (l1 , Sin )− (l − → = CSfinal ∩ CSexit    → ← − − 2 , Sin ) ∩ − = Sin  (l1 , l2 ) ∈ E, Sout ∈ CSl2 , (l1 , Sout )← (l CSl1 . The notion of a bounded structure can be generalized by considering any subset of the set of unary predicates, as done in TVLA. The operator is the least upper bound operator in DB-STRUCT . Combining Shape Analyses by Intersecting Abstractions 39 Here, it is assumed that the concrete 2-valued structures also record information on temporal properties that hold on program executions. The program is represented as a control flow graph, with entry and exit nodes entry and exit, respectively, and a set of control flow edges E. CSinit is the initial set of concrete stores at the entry location including all possible values associated with temporal properties. CSfinal represents the set of states in which all temporal properties are set to their final values (that is, their values upon termination of the exe→ 2 , Sout ) to denote the transformation induced by (l cution). We write (l1 , Sin )− the forward execution of the statement or condition at edge (l1 , l2 ). Program conditions are interpreted according to the standard semantics. Note that the forward semantics sets values non-deterministically to the temporal properties − 2 , Sin ) to denote the transformation induced (l predicates. We write (l1 , Sout )← by the backward execution of the statement or condition at edge (l1 , l2 ). This semantics sets the values of the changed temporal properties. Variables whose values are changed are updated non-deterministically. The above system of equations does not necessarily terminate for programs with loops. Therefore, an upper approximation to this system is conservatively computed by representing sets of states using 3-valued structures. Extra predicates store values of tracked temporal properties. Moreover, the ability to define unary predicates allows tracking of an unbounded number of temporal properties. Both forward and backward executions are conservatively executed on 3-valued structures. However, as backward reasoning uses results obtained by the forward counterpart, it is considered a secondary stage taking place after the forward reasoning is complete. Finally, intersection (∩) is over-approximated using meet (). 3.1 Compile-Time GC Analysis We now explain how compile-time garbage collection information can be computed using a phased bidirectional verification. In particular, we are interested in identifying the first point in the trace where an object is not further used, and therefore may be safely deallocated by a free statement. Thus, the backward execution of a statement tracks the use of objects. Our analysis maintains the predicate use(v) to track object future usage information. An object v is denoted used in a statement or a condition at edge (l1 , l2 ), if a reference expression e, that evaluates to v, is used for dereference at that statement. In such a case, the backward execution of the statement − 2 , Sin ) records in Sin the fact that v is used by setting use(v) to (l (l1 , Sout )← 1. As mentioned, the forward execution of a statement non-deterministically sets values to use(v). Fig. 3(a) shows one of the structures that arise before the statement t = y.n at line 10 of Fig. 1, and Fig. 3(b) shows one of the structures that arise after that statement. The object referenced by y is still used before the statement, as use(v) holds for the individual referenced by y. Nonetheless, the object referenced by y is not (further) used after that statement, as use(v) does not hold for the individual referenced by y. Having verified that use(v) does not hold for any 40 G. Arnold et al. y, t n (a) n n n n x rx,n rx,n rx,n ry,n rt,n liven use rx,n ry,n rt,n liven use y t n n (b) n n n n x rx,n rx,n rx,n ry,n rx,n ry,n rt,n liven use rx,n ry,n rt,n liven use Fig. 3. 3-valued structures representing sets of program configurations, including heap object and reference field liveness, that arise (a) before the execution of the statement t = y.n; and (b) after it is executed individual v referenced by y, for all structures that may arise after the aforementioned statement, we conclude that free y may be inserted after the statement t = y.n, to deallocate the object referenced by y, as it is no longer used in the program. Moreover, since for all structures arising before that statement, the object referenced by y is still used, placing a free y after that statement will free the space referenced by y at the earliest possible time. 3.2 Assign-Null Analysis Another application of phased bidirectional analysis is the computation of heap reference liveness, providing for compile-time optimization of runtime garbage collection effectiveness. For each object reference field, we identify whether it is live at any point in the trace, meaning that it may be used, prior to being redefined, after that point. We are interested in spotting points in the trace where a reference field becomes dead, and therefore may be assigned a null value, thus significantly reducing potential GC drag time [8]. Here, again, the backward execution of the statement tracks the uses (dereference) and redefinitions (assignment) of object fields. In particular, for each reference field f which is a member of some object v, the predicate livef (v) is used to record future use and re-definition information (in our example f is n). A reference field f of an object v is denoted used in a statement or a condition at edge (l1 , l2 ) if an expression e—which is not an l-value—refers to the value of − 2 , Sin ) sets f . In this case, the backward execution of the statement (l1 , Sout )← (l livef (v) to 1. Otherwise, f is denoted redefined if it is being assigned a new value, namely, being referred to by an l-value expression e. In this case, the backward execution of the statement sets livef (v) to 0. Here as well, forward execution non-deterministically sets values to livef (v). Section 5 includes experimental results for an implementation of both of the analyses described in this section. Combining Shape Analyses by Intersecting Abstractions 4 41 Computing the Meet of Heap Abstractions In this section, we develop a meet algorithm for a family of abstract domains and discuss the complexity of the algorithm for two cases: (i) for arbitrary 3-valued structures, and (ii) for bounded structures. 4.1 The Problem Setting Our aim is to provide an algorithm applicable for a family of abstract domains based on 3-valued structures, including the abstract domain of bounded structures, DB-STRUCT . We design a meet algorithm for the domain D3-STRUCT , which we consider as a basis for other abstract (sub-) domains. Given a sub-domain D ⊆ D3-STRUCT and a set of abstract elements X ∈ D, the result of the algorithm is possibly not an element of D. However, when D X is defined, the inequality DX  D3-STRUCT X holds. Therefore, a domain specific operator RefineD : D3-STRUCT → D can be used to refine the result to yield an element of D, RefineD ( D3-STRUCT X) = D X. For certain abstract domains, including DB-STRUCT , no refinement is required. We now explain this formally. Definition 6. We say that an abstract domain D ⊆ D3-STRUCT , with the same ordering between abstract elements as in D3-STRUCT (see Definition 4), is meetadmissible when it satisfies the following conditions. Sublattice of  D3-STRUCT . D is a lattice, and D X = D3-STRUCT X and  X = D D3-STRUCT X for every finite subset X of D. Closure of singletons. For every structure S ∈ 3-STRUCT, if S exists in some set XS ∈ D then {S} ∈ D. This condition allows us to break the problem of computing meet on sets of structures to a set of sub-problems where meet is computed on pairs of structures. Theorem 1. The (parametric) abstract domain of bounded structures, DB-STRUCT , is meet-admissible. The following proposition reduces the problem of computing the meet of two sets of structures to the problem of computing the meet of two structures by using the join operator, which we discuss at the end of this section. Proposition 1. Let XS1 , XS2 be two elements in a meet-admissible domain D. Then,  XS1  XS2 = {S1 }  {S2 } . (2) S1 ∈XS1 S2 ∈XS2 In the remainder of this section, we consider the following problem. Given two structures S1 , S2 ∈ 3-STRUCT, compute {S1 }  {S2 }. 42 G. Arnold et al. 4.2 Computing the Meet of Two Structures Fig. 4 shows two structures and their meet. (For now, ignore the edges between the structures in Fig. 4(a) and Fig. 4(b).) The structure in Fig. 4(a) arises during forward shape analysis, after the statement t = y.n at line 10 of the running example; this is the structure from Fig. 2(b) with non-deterministic assignments to the values of the predicates puse(v) and liven (v). The structure in Fig. 4(b) is obtained from the structure in Fig. 3(b) by backward execution of the statement y = t at line 11 of the running example. The meet of these two structures results in the structure shown in Fig. 4(c). We now establish a connection between the structures that comprise the result of meet and certain relations that hold between their individuals. We first define the meet of two Kleene values t1 and t2 . If t1  t2 then t1  t2 = t1 , if t2  t1 then t1  t2 = t2 , and otherwise the result is undefined and we denote it by the special symbol ⊥. Definition 7 (Meet Correspondence). Given two structures S1 = (U1 , ι1 ) and S2 = (U2 , ι2 ), a relation M ⊆ U1 × U2 is a meet correspondence between S1 and S2 when it is: (a) Full, i.e., ∀u1 ∈ U1 : ∃v2 ∈ U2 : u1 M v2 and ∀v2 ∈ U2 : ∃u1 ∈ U1 : u1 M v2 ; and (b) Consistent, i.e., for every predicate p of arity k, and a pair of ktuples u1 , . . . , uk ∈ U1 k and v1 , . . . , vk ∈ U2 k , such that ui M vi for i = 1 . . . k, pS1 (u1 , . . . , uk )  pS2 (v1 , . . . , vk ) = ⊥ . x x x rx,n use=1/2 liven =1/2 rx,n ry,n =1/2 rx,n n n rx,n use=1/2 liven =1/2 n n rx,n ry,n use=1/2 liven =1/2 y n t rx,n ry,n rt,n use=1/2 liven =1/2 n rx,n ry,n rt,n use=1/2 liven =1/2 n (a) n rx,n n ry,n =1/2 rx,n ry,n =1/2 rx,n rt,n use liven ny n n rx,n ry,n y t t n ry,n =1/2 rx,n rt,n use liven n (b) n n rx,n ry,n rt,n use liven n rx,n ry,n rt,n use liven (c) Fig. 4. An example for computing meet for the running example. (a) A structure that arises during the forward shape analysis; (b) A structure that arises during the backward (object liveness) analysis. (c) The meet of (a) and (b). Combining Shape Analyses by Intersecting Abstractions 43 The structures in Fig. 4(a) and Fig. 4(b) have exactly one meet correspondence, which is shown by the edges between their individuals. We can use a meet correspondence to construct a common lower bound of two structures in the following way. Definition 8. Given a meet correspondence M between structures S1 = (U1 , ι1 ) and S2 = (U2 , ι2 ), the operation S1 M S2 yields the M -induced structure S = (U, ι), where U = {u, v ∈ M }, and the interpretation of every predicate p of arity k and every k-tuple of nodes u1 , v1 , . . . , uk , vk  ∈ U k is given by pS (u1 , v1 , . . . , uk , vk ) = pS1 (u1 , . . . , uk )  pS2 (v1 , . . . , vk ) . We are now ready to characterize the result of the meet operator in terms of meet correspondences. Theorem 2. Let MS1 ,S2 ⊆ ℘(U1 × U2 ) denote the set of meet correspondences between structures S1 and S2 . Then,  {S1 M S2 } . {S1 }  {S2 } = M∈MS1 ,S2 Theorem 2 already gives us a naive way to compute meet by: (a) Enumerating all relations M ∈ U1 × U2 ; (b) Checking each of them to see whether it constitutes a meet correspondence; (c) For each meet correspondence, computing S1 M S2 , and (d) Combining the results via join. Although the meet of two structures is a set of structures containing 2|U1 ×U2 | structures in the worst case, the size of the set is usually small, in practice. Notice that the above approach is intractable even when the number of structures is small, since the majority of the relations are not meet correspondences. An immediate consequence of [10] is that deciding whether the meet of two arbitrary 3-valued structures is empty is NP-complete. The next theorem states that meet is computationally hard even for two bounded structures. Theorem 3. Given two bounded structures S1 and S2 , the problem of deciding whether {S1 }  {S2 } = ∅ is NP-complete. Since the problem of computing meet with polynomial worst-case complexity is hard, we aim to achieve good efficiency in practice. We develop an algorithm based on a strategy. The strategy exploits certain properties of the abstract domain to prune the set of relations and find the meet correspondences. In Section 5, we supply empirical evidence showing that the algorithm successfully prunes most irrelevant relations when used in an abstract interpreter for inferring temporal heap properties on several benchmark programs. 4.3 Enumerating Meet Correspondences We now present a strategy for exploring the (exponential) space of relations between two structures, searching for meet correspondences. The strategy, shown 44 G. Arnold et al. in pseudo-code in [1], attempts to prune relations that do not constitute a meet correspondence as much as possible, and relies on another procedure for solving a graph-matching problem on graphs (explained below). The strategy consists of 4 stages that are run consecutively: 1. Consistency of nullary predicates. If there exists a nullary predicate p such that pS1 () = 1 and pS2 () = 0 or pS1 () = 0 and pS2 () = 1, then the result of meet is the empty set. 2. Removing infeasible node pairs. We remove from the set U1 × U2 node pairs u, v such that there exists a predicate p of arity k and pS1 (uk )  pS2 (v k ) = ⊥, where uk denotes a k-tuple containing the node u in all k positions. By Definition 7 these pairs are not contained in any meet correspondence. 3. Finding full relations. To satisfy the fullness requirement of Definition 7, we solve the following graph matching problem. Given a graph G = V, E and a subset W of V , find all subsets M ⊆ E such that in the graph V, M  the degree of every vertex is at least 1, and for vertices in W the degree is at most 1. In our case, V = U1 ∪ U2 , E is the set of pairs from the previous stage, and W is the set of non-summary nodes. An (worst-case exponential time) algorithm for this problem that uses several heuristics to solve this problem efficiently is discussed in [1]. 4. Consistency test. The full relations from the previous stage are tested for consistency according to Definition 7 (in polynomial time). The relations that pass the test are meet correspondences and are used to create M -induced structures which are combined via join to yield the result. The intuition behind the second stage is that two structures, possible produced by different analyses, may share a common set of unary predicates that are assigned only definite values, i.e., 0 or 1. Usually, these are the predicates that represent reference variables. In such cases, these predicates help prune many of the infeasible edges and determine a subset of edges with degree 1, which must participate in every meet correspondence. Our algorithm uses these edges to reduce the amount of searching that has to be done. In Fig. 4, the first stage of the algorithm is degenerate, as there are no nullary predicates. The second stage prunes the set of all node pairs, which consists of 20 pairs, to 5. This reduction occurs since the predicates x, t, rx,n , and rt,n have definite values in both structures. In this example, there is only one full relation, which is returned by the third stage of the algorithm. This relation is indeed consistent, and thus the structure in the output is produced. 4.4 Computing Join The join of sets of 3-valued structures is set union, followed by removal of nonmaximal structures. To remove non-maximal structures, we need an algorithm to check for whether a structure S1 = (U1 , ι1 ) is embedded in a structure S2 = (U2 , ι2 ). We observe that an embedding relation (see Definition 3) is actually a meet correspondence that satisfies a stricter version of the consistency condition. It Combining Shape Analyses by Intersecting Abstractions 45 is: (i) Full; and (ii) for every predicate p of arity k, and a pair of k-tuples u1 , . . . , uk ∈ U1 k and v1 , . . . , vk ∈ U2 k , such that ui M vi for i = 1 . . . k, pS1 (u1 , . . . , uk )  pS2 (v1 , . . . , vk ) . Since checking the second condition for two structures can be done in polynomial time, we can reuse the techniques for finding meet correspondences. In the first stage we check that for every nullary predicate p, pS1 ()  pS2 (). In the second state we remove from the set U1 × U2all node pairs u, v such that there exists a predicate p of arity k and pS1 (uk )  pS2 (v k ). We then proceed by enumerating full relations over the remaining node pairs to find one that fulfills the second condition of the embedding relation. For arbitrary 3-valued structures, checking embedding is NP-complete. However, for bounded structures our algorithm decides the problem in polynomial time. This is because, for two bounded structures, an embedding relation, if one exists, is unique and completely determined by the unary predicates. 5 Inferring Temporal Properties for Compile-Time Memory Management Compile-time GC is most desirable for lightweight Java-based platforms, such as JavaCard, where the penalty induced by a runtime GC is sometimes intolerable due to the limited space and processing power. Such platforms normally provide a mechanism for explicit memory deallocation, e.g., through a free directive. We have implemented the phased bidirectional analysis described in Section 3 in the TVLA system to infer compile-time GC information. Our analysis infers information for producing a set of free statements that can be safely added to the program to free unused objects. Moreover, our analysis ensures that an object is deallocated at the earliest possible time, i.e., immediately after the object is last used. 5.1 Experimental Results Table 2 shows our benchmark programs, which were used in [9].3 The first four programs involve manipulations of singly-linked lists. DLoop and DPairs involve manipulations of doubly-linked lists. The small-javac example was used in [8], where it has been shown that a significant potential for compile-time GC exists by manually rewriting the code to include null assignments. Our assign-null analysis is able to yield the manual rewriting automatically. On all benchmark programs, both our compile-time GC and assign-null analyses were able to detect all opportunities for object deallocation and safe assignment of null to reference fields, respectively. This information allows the reclamation of unused space at the earliest possible time. For example, considering the program in Fig. 1, the compile-time GC analysis was able to determine the safe deallocation of the object pointed by y right after line 10, thus deallocating list elements as soon as they are being traversed. Our assign-null analysis was 3 The programs are available from www.cs.tau.ac.il/∼rumster/ctgc benchmarks.zip. 46 G. Arnold et al. Table 2. Benchmarks and analysis costs (in seconds and Mb.) Program Description Forward Backward Time Space Time Space Loop Running example (Fig. 1) 0.9 1.0 1.6 1.8 CReverse Constructive list reversal 3.0 2.0 5.7 4.2 Delete Deletion of a list element 12.4 3.2 41.1 12.9 DLoop Doubly-linked list variant of Loop 1.4 1.3 2.3 2.5 DPairs Doubly-linked list traversal in pairs 3.0 2.0 5.5 4.0 small-javac Emulation of JavaC’s parser facility 528.9 32.1 334.4 77.6 able to verify that a y.n=null assignment could be inserted after line 10. The analyses proved similar properties for the other benchmark programs. Table 2 shows the costs of the analysis on the benchmark programs. As both analyses have very similar costs, we only show the results of the compile-time GC analysis. The experiments were conducted on a 1.6 GHz laptop with 512 Mb. of memory, running Windows XP. In addition to analysis time and space, we measured two redundancy factors related to our meet algorithm. We evaluated the efficiency of the graph matching algorithm in stage 3. The results show that for all benchmark programs, at most 0.5% of the expanded search space did not lead to valid matchings. We also measured the percentage of full relations computed during stage 3 of the algorithm that did not constitute meet correspondences (eliminated in stage 4). In all benchmarks the average number of relations that were eliminated did not exceed 0.3%, and in most benchmarks no eliminations occurred. We believe that our meet algorithm is efficient for the the following reason. The forward shape analysis produces very precise information, which means that the values of the unary shape predicates (in Table 1) are almost always definite. When the backward phase computes the backward effect of a statement, it accepts a structure where all unary predicates are definite and assigns nondeterministic values to only a fixed number of unary predicates—y and rn,y in the structure shown in Fig. 4(b). Then, the meet is applied to structures where most unary shape predicates predicates have definite values. Our algorithm is geared to exploit these situations by focusing the search for meet correspondences. 6 Related Work Computing Meet of Heap Abstractions. In [3], a meet is used for interprocedural shape analysis. Two algorithms are presented for computing meet on bounded structures. The first algorithm uses a “canonicalization”4 operation to transform the structures to sets of structures in the image of canonical abstraction with the same concretization. Computing meet for the resulting structures is then straightforward. However, canonicalization can unnecessarily increase the number of structures by an exponential factor in the worst case. In our examples the worst case would indeed manifest itself, since we set the values of the 4 Canonicalization is a semantic reduction akin to substituting abstract elements by their respective set of join-irreducibles. Combining Shape Analyses by Intersecting Abstractions 47 temporal heap properties to non-deterministic values. Our algorithm avoids this problem by operating directly on the given structures. The second algorithm approximates meet by transforming one of the structures into a dynamic set of constraints and using a constraint solver. While usually more efficient than the first algorithm, it computes an over-approximation of meet. We believe that our algorithm can be used to improve the running times of the interprocedural analysis reported in [3]. In [4], it is shown how to compute meet for a class of formulas that precisely characterize bounded structures. The computation is essentially achieved in the same way as the first algorithm in [3]. In [11], a symbolic semi-algorithm for meet is presented. The algorithm converts bounded structures to formulas, and then uses logical conjunction to compute the result in the domain of formulas. Converting the resulting formula back to bounded structures is done via a theorem prover. The algorithm operates with respect to a finer concretization function than the one defined in Section 2. Specifically, this concretization function is parameterized by a set of integrity constraints C, and is defined by   γC (S) = S  ∈ 2-STRUCT | S   S, S  |= C . The advantage of this algorithm is that it provides the most precise result with respect to γC . However, its performance can be quite low, due to the use of canonicalization and a potentially large number of calls to a theorem prover. A distinct advantage of the algorithm presented in this paper is that it is not restricted to bounded structures and works for any set of 3-valued structures. Compile-Time Memory Management. Most of the work on compile-time GC analysis has been done for functional languages. This paper demonstrates a compile-time GC analysis that applies to an imperative language with destructive updates, and is capable of reclaiming an object that is still reachable, but not used further in the run. In [9], a user-specification-driven compile-time GC and assign-null analysis are described. The user specifies a free query of the form (pt, x), where pt is a program location and x is a program variable. A positive answer to the query means that a free x statement may be issued after program point pt. In contrast, the algorithms in this paper do not require nor rely on a user-specified queries, but rather perform an analysis on an exhaustive set of queries generated automatically using a simple heuristic. We believe our approach may be significantly more efficient compared to the analysis of [9] with our exhaustive set of queries. References 1. G. Arnold, R. Manevich, M. Sagiv, and R. Shaham. Intersecting heap abstractions with applications to compile-time memory management. Technical Report TR-2005-04-135520, Tel Aviv University, Apr 2005. Available at http://www. cs.tau.ac.il/∼rumster/TR-2005-04-135520.pdf. 48 G. Arnold et al. 2. P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction of approximation of fixed points. In Symp. on Princ. of Prog. Lang., pages 238–252, New York, NY, 1977. ACM Press. 3. B. Jeannet, Alexey L., T. Reps, and M. Sagiv. A relational approach to interprocedural shape analysis. In Proc. Static Analysis Symp. Springer, 2004. 4. V. Kuncak and M. Rinard. Boolean algebra of shape analysis constraints. In 5th International Conference on Verification, Model Checking and Abstract Interpretation (VMCAI’04), pages 59–72. Springer, January 2004. 5. T. Lev-Ami and M. Sagiv. TVLA: A system for implementing static analyses. In Proc. Static Analysis Symp., pages 280–301, 2000. 6. Z. Manna and A. Pnueli. A hierarchy of temporal properties (invited paper). In Proceedings of the ninth annual ACM symposium on Principles of distributed computing, pages 377–410, 1989. 7. M. Sagiv, T. Reps, and R. Wilhelm. Parametric shape analysis via 3-valued logic. ACM Transactions on Programming Languages and Systems, 24(3):217–298, 2002. 8. R. Shaham, E. K. Kolodner, and M. Sagiv. Heap profiling for space-efficient Java. In SIGPLAN Conf. on Prog. Lang. Design and Impl., pages 104–113. ACM Press, June 2001. 9. R. Shaham, E. Yahav, E. K. Kolodner, and M. Sagiv. Establishing local temporal heap safety properties with applications to compile-time memory management. In Proc. of Static Analysis Symposium (SAS’03), volume 2694 of LNCS, pages 483– 503. Springer, June 2003. 10. G. Yorsh. Logical characterizations of heap abstractions. Master’s thesis, Tel-Aviv University, Tel-Aviv, Israel, 2003. http://www.cs.tau.ac.il/∼gretay/. 11. G. Yorsh, T. Reps, and M. Sagiv. Symbolically computing most-precise abstract operations for shape analysis. In Tools and Algorithms for the Construction and Analysis of Systems, 10th International Conference (TACAS 2004), pages 530–545. Springer, March 2004. A Complete Abstract Interpretation Framework for Coverability Properties of WSTS Pierre Ganty1, , Jean-François Raskin1 , and Laurent Van Begin2 1 Département d’Informatique, Université Libre de Bruxelles {pganty, jraskin}@ulb.ac.be 2 LIAFA, Université Paris 7 lvbegin@liafa.jussieu.fr Abstract. We present an abstract interpretation based approach to solve the coverability problem of well-structured transition systems. Our approach distinguishes from other attempts in that (1) we solve this problem for the whole class of well-structured transition systems using a forward algorithm. So, our algorithm has to deal with possibly infinite downward closed sets. (2) Whereas other approaches have a non generic representation for downward closed sets of states, which turns out to be hard to devise in practice, we introduce a generic representation requiring no additional effort of implementation. 1 Introduction Model-checking is nowadays widely accepted as a powerful technique for the automatic verification of reactive systems that have natural finite state abstractions. However, many reactive systems are only naturally modeled as infinitestate systems. This is why a large research effort was done in the recent years to allow the direct application of model-checking techniques to infinite-state models. This research line has shown successes for several interesting classes of infinite-state systems, for example: timed automata [1], hybrid automata [2], fifo channel systems [3, 4], extended Petri nets [5, 6], broadcast protocols [7], etc. General decidability results hold for a large class of infinite-state systems called the well-structured transition systems, WSTS for short. WSTS are transition systems whose sets of states are well-quasi ordered and whose transition relations enjoy a monotonicity property with respect to the well-quasi order. Examples of WSTS are Petri nets [8], monotonic extensions of Petri nets (Petri nets with transfer arcs [9], Petri nets with reset arcs [10], and Petri nets with non-blocking arcs [11]), broadcast protocols [12], lossy channel systems [3]. For all those classes of infinite-state systems, we know that an interesting and large class of safety properties are decidable by reduction to the coverability problem. The coverability problem is defined as follows: “given a WSTS for the well-quasi   Supported by the FRFC project “Centre Fédéré en Vérification” funded by the Belgian National Science Foundation (FNRS) under grant nr 2.4530.02. Pierre Ganty is supported by the FNRS under a FRIA grant. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 49–64, 2006. c Springer-Verlag Berlin Heidelberg 2006  50 P. Ganty, J.-F. Raskin, and L. Van Begin order , and two states c1 and c2 , does there exist a state c3 which is reachable from c1 and such that c3  c2 ?” (in that context, we say that c3 covers c2 ). Broadly speaking, there are two ways to solve the coverability problem for WSTS. The first way to solve the coverability problem is to explore backwardly the transition system by iterating the pre operator1 starting from the set of states that are greater or equal to c2 . This simple procedure is effective when very mild assumptions are met. In fact, for any well-quasi ordered set (X, ), the following nice property holds: every -upward closed2 set can be finitely represented using its finite set of minimal elements3 . This generic representation of -upward closed set is adequate as union and inclusion are effective. The only further property that is needed for the procedure to be effective is that given a finite set of minimal elements M defining an -upward closed set U , it must be possible to compute the finite set of minimal elements M  representing pre(U ). Higman’s lemma [13] on well-quasi orders ensure the termination of this procedure. The second way is to explore forwardly the transition system from the initial state c1 . Here, the situation is more complicated. A saturation method that iterates the post operator4 from c0 can not lead to an algorithm as the reachability problem is undecidable for WSTS. Recently, we have shown that the coverability problem can be decided in a forward way by constructing two sequences of abstractions of the reachable states of the system, one from below and one from above [14]. The sequence of abstractions from below allows us to detect positive instances of the coverability problem and it is simply the bounded iteration of post from the initial state. The abstraction from above is the iteration of an overapproximation of post over downward closed set of states that becomes more and more precise. This sequence allows us to decide negative instances of the problem. This schema of algorithm is general but to be applicable to a given class of WSTS, the user has to provide a, so called, adequate domain of limits. This set is in fact a (usually infinite) set of abstract values that allows to represent any downward closed set. The situation is less satisfactory than for upward closed set where there exists, as we have seen above, a simple and generic way to represent upward closed set by sets of minimal elements. Such a generic way of representing downward closed sets was missing and this problem is solved here. The contributions of this paper are as follows. First, we show that for any wellquasi ordered set, there exists a generic and effective representation of downward closed sets. To the best of our knowledge, this is the first time that such a generic representation is proposed. An attempt in that direction was taken in [15] but the result is a theory for designing symbolic representation of downward closed sets 1 2 3 4 A function that returns all the states that have a one-step successor in a given set of states. A set S is upward (resp. downward) closed if for any c such that c  s (resp. c  s) for some s ∈ S we have c ∈ S. Or a finite set of its minimal elements if  is not a partial order. A function that returns all the one-step successors states of a given set of states. A Complete Abstract Interpretation Framework 51 and not a generic symbolic representation of such sets. As a consequence, their theory has to be instantiated for the particular class of WSTS that is targeted and this is not a trivial task. Second, as downward closed sets are abstractions for sets of reachable states in the forward algorithm, we formalize our generic representation of downward closed set as a generic abstract domain. This allow us to rephrase in a simpler way the forward algorithm, first proposed in [14], in the context of abstract interpretation. Third, we show how to automatically refine the abstract domain in order to obtain, in an efficient way, overapproximations that are guaranteed to be sufficiently precise to decide the coverability problem. Our paper is organized as follows. Section 2 presents some preliminaries. Section 3 introduces the generic representation of downward closed sets. In Section 4 we will be concerned with the abstract interpretation of WSTS. Section 5 is devoted to the refinement of the abstract domain. Section 6 shows on an example how these techniques work. A version of the paper containing all proofs is available at [16]. 2 2.1 Preliminaries Well-Quasi Ordered Sets A preorder  is a binary relation over a set X which is reflexive, and transitive. The preorder  is a well-quasi order (wqo for short) if there is no infinite sequence x0 , x1 , . . . , such that xi  xj for all i > j ≥ 0. A set M ⊆ X is said to be canonical if for any distinct x, y ∈ M we have x  y. We say that M ⊆ S is a minor set of S ⊆ X, if for all x ∈ S there exists y ∈ M such that x  y, and M is canonical. Lemma 1 (From [17]). Let (X, ) be a well-quasi ordered set (wqo-set for short). For any set S ⊆ X, S has at least one finite minor set M . We use min to denote a function which, given a set S ⊆ X, returns a minor set of S. Let (X, ) be a wqo-set, we call x↓= {x ∈ X | x  x } and x↑= {x ∈ X | x  x} the -downward closure and -upward closure of x ∈ X, respectively. This definition is naturally extended to sets in X. We define a set S ⊆ X to be a -downward closed set (-dc-set for short), respectively -upward closed set (-uc-set for short), iff S↓= S, respectively S↑= S. Examples of such sets are given in Fig. 1. For any wqo-set (X, ), we define DCS (X) (UCS (X)) to be the set of all -dc-sets (-uc-sets) in X. For any x ∈ X we define the -equivalence class of x, denoted [x], to be the set x↑ ∩x↓, i.e. the set of elements that are -equivalent to x. For A and B subsets of X, we say that A ≡ B if A↑= B↑. Observe that A ≡ B iff for all a ∈ A there is a b ∈ B such that a  b, and vice versa. We now recall a well-known lemma on -uc-sets and -dc-sets. Lemma 2 (From [17]). Let (X, ) a wqo-set and an infinite sequence of uc-set U0 U1 . . . such that ∀i ≥ 0 : Ui ⊆ Ui+1 . There exists j ≥ 0 : ∀j  ≥ j : Uj = Uj  . Symmetrically, given an infinite sequence of -dc-sets D0 D1 . . . such that ∀i ≥ 0 : Di ⊇ Di+1 , there exists j ≥ 0 : ∀j  ≥ j : Dj = Dj  . 52 P. Ganty, J.-F. Raskin, and L. Van Begin y . . . 3 B . 2 . . D C 1 1 2 A . . . x 3 The wqo  is defined as follows (a1 , a2 )  (b1 , b2 ) if and only if a1 ≥ b1 and a2 ≥ b2 . The -dcsets A and B are infinite -dc-set : A = {(x, y) ∈ N2 | y ≤ 1}, B = {(x, y) ∈ N2 | x ≤ 1}. On the contrary, the -dc-set C = {(x, y) ∈ N2 | x ≤ 2 ∧ y ≤ 2} is finite. The -uc-set D is given by {(x, y) ∈ N2 | x ≥ 3 ∧ y ≥ 2}. Note that D has exactly one minor set since  is a partial order. Fig. 1. -dc-sets and -uc-sets in N2 We now introduce a lemma stating several facts about sets and their closure. These facts are merely of technical interest and will be used subsequently. Lemma 3. 1. For any S, S  ⊆ X, S↓ ∩S ↑= ∅ ⇔ S↓ ∩S  = ∅ ⇔ S ∩ S ↑= ∅. 2. For any S, S  ⊆ X, S↑⊆ S ↑⇔ ∀s ∈ S ∃s ∈ S  : s  s . 3. ∀s ∈ X, S ∈ UCS (X) : s ∈ S ⇔ ∃s ∈ min(S) : s  s . Lemma 3.2 and 3.3 suggest an effective representation of -uc-sets: every uc-set U can be finitely represented by min(U ). For decidable well-quasi order , this readily gives us an effective procedure to check inclusion between two -uc-sets, to check membership and to compute union [18]. Notations. Sometimes we write s instead of the set {s}. Unless otherwise stated the transitive and reflexive closure f ∗ of a function f such that its domain and co domain coincide is given by i≥0 f i where f 0 is the identity and f i+1 = f i ◦ f . Finally, let us recall the following property on sets that we will use without mention in our proofs: A ⊆ B iff A ∩ (X \ B) = ∅. 2.2 Well-Structured Transitions Systems In this paper we follow [19] in the definition of well-structured transition systems. Definition 1. A well-structured transition system (WSTS) S is a tuple (X, δ, ) where X is a (possibly) infinite set of states, δ ⊆ X × X is a transition relation between states — we use the notation x → x if (x, x ) ∈ δ —, and ⊆ X × X is a preorder between states such that the two following conditions hold: (i)  is a wqo; and (ii) ∀x1 , x2 , x3 ∃x4 : (x3  x1 ∧ x1 → x2 ) ⇒ (x3 →∗ x4 ∧ x4  x2 ), where →∗ is the reflexive and transitive closure of the transition relation (upward compatibility)5 . Moreover, we define an initialized WSTS (IWSTS) to be a pair (S, x0 ) where S = (X, δ, ) is a WSTS and x0 ∈ X is the initial state. We adhere to the convention that if S0 is an IWSTS then S is its WSTS. 5 Upward compatibility is more general than the compatibility used in [17]. A Complete Abstract Interpretation Framework 53 def Let S = (X, δ, ) be a WSTS and T ⊆ X, post [S](T ) = {x | ∃x ∈ T : x → x }. Analogously, we define pre[S](T ) as {x | ∃x ∈ T : x → x }. We define def minpre[S](T ) = min((pre[S](T ↑))↑). To shorten notation, we write pre,minpre and post if the WSTS is clear from the context. The following definition follows [17]. Definition 2. An effective WSTS is a WSTS S = (X, δ, ) where both  and → are decidable and for all x ∈ X : minpre[S](x) is computable. 2.3 The Coverability Problem The verification of safety properties on IWSTS reduces to the so called coverability problem. Problem 1. The coverability problem for IWSTS is defined as follows: “Given an IWSTS ((X, δ, ), x0 ) and bad ∈ UCS (X), post ∗ (x0 ) ∩ bad = ∅? ” In general, bad is an upward closed set of states where errors occur. Two solutions to the coverability problem can be found in the literature. The first one (see [17, 19]) is a backward approach based on the following two lemmas: Lemma 4 (From [19]). Given a WSTS S = (X, δ, ) and U ∈ UCS (X), (a) pre ∗ (U ) ∈ UCS (X), and (b) minpre ∗ (min(U ))↑= pre ∗ (U ). Lemma 4, together with Lemma 1 and 2, show how to (symbolically) compute the (possibly) infinite set pre ∗ (U ) using the minor sets of -uc-sets. Once pre ∗ (U ) is computed, or rather a finite representation using one of its minor set, one can decide the coverability problem by testing if the initial state is in pre ∗ (U ) by using Lemma 3.3. The second approach is a forward approach based on the notion of covering set [20, 12]. The covering set Cover (S0 ) of an IWSTS S0 = (S, x0 ) is given by def Cover (S0 ) = post ∗ (x0 )↓. The following lemma shows the usefulness of covering sets to solve the coverability problem: Lemma 5. Given an IWSTS S0 = ((X, δ, ), x0 ) and bad ∈ UCS (X), Cover (S0 ) ∩ bad = ∅ if and only if post ∗ (x0 ) ∩ bad = ∅. As already mentioned in the introduction, there are two difficulties to overcome when trying to design a forward algorithm for the coverability problem: 1. Currently, there are no generic way to effectively represent and manipulate -dc-sets (as the one shown above for -uc-sets). So, for every wqo-set (X, ) one has to design a symbolic representation for the sets in DCS (X). 2. The set Cover (S0 ) is in general not effectively constructible, see [10] for details. As a consequence, all the algorithms based on its construction (except the well-known Karp-Miller algorithm on Petri nets) may fail to terminate. 54 P. Ganty, J.-F. Raskin, and L. Van Begin To overcome those two difficulties: 1. In [15], the authors propose a methodology to design a symbolic representation of dc-sets. However the design of such a symbolic data-structure is far from being trivial. 2. The authors of this paper proposed, in [14], an algorithmic schema called expand, enlarge and check which can be instantiated for any class of WSTS as long as a symbolic representation of dc-sets is provided (called there an adequate set of limits). In this paper, we provide, in our opinion, a much more satisfactory answer to those two difficulties by providing, in the form of a generic abstract domain and a generic abstract analysis, a completely generic algorithm to solve the coverability problem for WSTS. 3 A Generic Abstract Domain In this section, we present a parametrized abstract domain that allows us to represent any -dc-set in a wqo-set (X, ). The parameter D is a finite subset of X and it defines the precision of the abstract domain. We also show that this parametrized abstract domain enjoys the following properties: (i) our parametrized abstract domain defines a complete lattice, (ii) we define an abstraction and a concretisation function that is shown to be a Galois insertion, (iii) any -dc-set can be exactly represented by our parametrized abstract domain provided an adequate value for the parameter D is used, and (iv) each -dc-set has a finite representation. Recall that the powerset lattice PL(A) associated to a set A is the complete lattice having the powerset of A as carrier, and union and intersection as least upper bound and greatest lower bound, respectively. In our setting the concrete lattice is the powerset lattice PL(X) of the set of states X. Fix a finite set D ⊆ X which is called the finite domain, the abstract lattice DPL(D) has DCS (D) as a carrier, D as the least upper bound operator, D as the greatest lower bound operator, and D and ∅ are the D -maximal and D -minimal element, respectively. We define the relation D over DCS (D) × DCS (D) such that for all P1 , P2 ∈ DCS (D) : P1 D P2 if and only if P1 ⊆ P2 , def def P1 D P2 = P1 ∪ P2 , P1 D P2 = P1 ∩ P2 . Notice that DPL(D) is complete because the union and the intersection operations are closed in DPL(D). Given an abstract lattice DPL(D), the abstraction and concretisation mappings are given as follows: ∀E ∈ PL(X) : ∀P ∈ DPL(D) : def α[D](E) = E↓ ∩D def γ[D](P ) = {x ∈ X | x↓ ∩D ⊆ P } . The set between brackets defines the parameter of the function and the set between parentheses is its argument. For simplicity of notation, we also write γ(P ), α(E), ,  and  if the parameter is clear from the context. A Complete Abstract Interpretation Framework 55 We next show through an example that the finite domain D actually parametrizes the precision of the abstract domain with respect to the concrete domain. Example 1. Let us consider the -dc-sets of Fig. 1 and consider the following finite domain D = {(0, 0), (3, 0), (0, 2), (0, 3)} depicted by the grey dots. Applying α on the -dc-sets A, B and C give, respectively, the (abstract) sets α(A) = {(0, 0), (3, 0)}, α(B) = {(0, 0), (0, 2), (0, 3)}, α(C) = {(0, 0), (0, 2)}. A and C are exactly represented, i.e. γ(α(A)) = A and γ(α(C)) = C, but B is not: γ(α(B)) = {(x, y) ∈ N2 | x ≤ 2}. But, if we add (2, 0) to D then B becomes representable. This generic abstract domain is a generalization of the ideas exposed in [21] for finite states systems. Fix a finite domain D, the concrete PL(X) and abstract DPL(D) domains and the abstraction α : PL(X) → DPL(D) and concretisation γ : DPL(D) → PL(X) α maps form a Galois insertion, denoted by PL(X)  DPL(D). γ α Proposition 1. For every finite domain D, PL(X)  DPL(D). γ Proof. Fix a finite domain D. It follows immediately from the definitions that α is monotonic (i.e., C ⊆ C  implies α(C)  α(C  )) and γ as well. Indeed, γ(P1 ) ⊆ γ(P2 ) ⇔ {c | c↓ ∩D ⊆ P1 } ⊆ {c | c↓ ∩D ⊆ P2 } ⇔ P1 ⊆ P2 ⇔ P1  P2 . So, it suffices to prove (a) and (b) below: (a) C ⊆ (γ ◦ α)(C) for every C ∈ PL(X). (γ ◦ α)(C) = {c ∈ X | c↓ ∩D ⊆ C↓ ∩D} ⊇ {c ∈ C | c↓ ∩D ⊆ C↓ ∩D} =C (b) (α ◦ γ)(P ) = P for every P ∈ DPL(D). (α ◦ γ)(P ) = {c | c↓ ∩D ⊆ P }↓ ∩D = {c | c↓ ∩D ⊆ P } ∩ D γ(P )↓= γ(P ) = {c ∈ D | c↓ ∩D ⊆ P } =P P ⊆ D and P ∈ DCS (D)   We now prove some properties on the precision of our abstract domain. The next lemma states that any -dc-set of X can be represented exactly using a finite domain D and a set P ∈ DCS (D). Lemma 6 (Completeness of the abstract domain). For each E ∈ DCS (X) there exists a finite domain D such that (γ ◦ α)(E) = E. Proof. Given E, we define the finite domain D to be D = min(X \ E). We prove (γ ◦ α)(E) = E. Let us show that (γ ◦ α)(E) ⊆ E. For that, suppose by contradiction that / E. there exists p ∈ (γ ◦ α)(E) ∧ p ∈ 56 P. Ganty, J.-F. Raskin, and L. Van Begin p∈ /E ⇔ p↓ E ⇔ p↓ ∩(X \ E) = ∅ ⇔ p↓ ∩min(X \ E) = ∅ ⇔ ∃p : p ∈ p↓ ∧p ∈ min(X \ E) ⇒ ∃p : p ∈ p↓ ∧p ∈ min(X \ E) ∧ ∃p ∈ [p ] : p ∈ D /E ⇔ ∃p : p ∈ p↓ ∧p ∈ D ∧ p ∈ p ∈ (γ ◦ α)(E) ⇔ p↓ ∩D ⊆ α(E) ⇔ p↓ ∩D ⊆ E↓ ∩D ⇔ p↓ ∩D ⊆ E ∩ D E ∈ DCS (X) Lem. 3.1 def. of D p, p  p ; p↓ ∩E = ∅ (1) def. of γ def. of α E ∈ DCS (X) ⇔ p↓ ∩D ⊆ E ⇔ ∀p : p ∈ p↓ ∧p ∈ D ⇒ p ∈ E ⇔ ¬¬ (∀p : p ∈ p↓ ∧p ∈ D ⇒ p ∈ E) ⇔ ¬ (∃p : p ∈ p↓ ∧p ∈ D ∧ p ∈ / E) (2) From (1) and (2) follows a contradiction. E ⊆ (γ ◦ α)(E) is immediate by property of Galois insertion. So, we have proved that (γ ◦ α)(E) = E.   Remark 1. While previous lemma states that any -dc-set can be represented using an adequate finite domain D, there is usually no finite domain D which is able to represent all the -dc-sets. It should be pointed out that -dc-sets can be easily represented through their (-uc-set) complement, i.e. by using a finite set of minimal elements of their complement. However with this approach the manipulation of -dc-sets is not obvious. In particular, there is no generic way to compute the post operation applied on a -dc-set by manipulating its complement. Also, as Cover (S0 ) is not constructible, it is, in some sense, useless to try to represent exactly the -dc-sets encountered during the forward exploration. On the other hand, we will see in Section 4 that our abstract domain allow us to define an effective and generic abstract post operator. Hereunder, Proposition 3 shows that the more elements you put into the finite domain D, the more -dc-sets the abstract domain is able to represent exactly. Proposition 2, which is used in many proofs, provides an equivalent definition for γ[D](P ). Proposition 2. Fix a finite domain D, for every P ∈ DPL(D) we have γ(P ) = X \ (D \ P )↑. Proposition 3. Fix two finite domains D and D such that D ⊂ D . For every P ∈ DPL(D), there exists a P  ∈ DPL(D ) such that γ[D](P ) = γ[D ](P  ). Effectiveness. It is worth pointing that since we impose finiteness of D then D , D are effective and D is decidable. So, given a finite domain D, the complete lattice DPL(D) represents an effective way to manipulate (infinite) -dc-sets. A Complete Abstract Interpretation Framework 57 Even if D is finite, it can be very large and so the abstract domain may be computationally expensive to manipulate. Compact data structures like Binary Decision Diagrams [22] and Sharing Trees [23, 18] may be necessary to use in practice. In Sect. 5 we need to decide the intersection emptiness between an -uc-set and a -dc-set. In input of this problem we are given an effective representation of these two sets. Then we solve the problem using the result of Lemma 3.1 together with the following proposition. Proposition 4. Fix a finite domain D, for all P ∈ DPL(D) there exists an effective procedure to answer the membership test, i.e. “given c ∈ X, does c belong to γ(P ) ?”. 4 Abstract Interpretation In this section, we define the forward abstract interpretation of a WSTS using an abstract domain parametrized by D as defined in the previous section. Let S be a WSTS and D be a finite domain, post # [S, D] : DPL(D) → DPL(D) def is the function defined as follows: post # [S, D] = λP.(α[D] ◦ post [S] ◦ γ[D])(P ). The function post # [S, D]∗ : DPL(D) → DPL(D) is defined as follows: def post # [S, D]∗ = λP. i≥0 post # [S, D]i (P ). We shorten post # [S, D] to post # and post # [S, D]∗ to (post # )∗ if the WSTS and the finite domain are clear from the context. The following lemma establishes the soundness of our abstract interpretation of WSTS which follows by property of Galois connection: Lemma 7. Given a WSTS (X, δ, ) with I ⊆ X and a finite domain D, (i) post (I) ⊆ (γ ◦ post # ◦ α)(I) and (ii) post ∗ (I) ⊆ (γ ◦ (post # )∗ ◦ α)(I). The next proposition shows that we can improve the precision of the analysis by improving the precision of the abstract domain. Proposition 5 (post # Monotonicity). Given a WSTS S= (X, δ, ), two finite domains D, D with D ⊆ D , and two sets C, C  ⊆ X with C ⊆ C  , we have, (1) (γ[D ] ◦ post # [S, D ] ◦ α[D ])(C) ⊆ (γ[D] ◦ post # [S, D] ◦ α[D])(C  ); and (2) (γ[D ] ◦ post # [S, D ]∗ ◦ α[D ])(C) ⊆ (γ[D] ◦ post # [S, D]∗ ◦ α[D])(C  ). Let us now show that if we fix a finite domain D, then post # is computable for any effective WSTS but first we need the following lemma: Lemma 8. Given a WSTS S=(X, δ, ), ∀x, x∈ X : x ∈pre(x↑) ⇔ x ∈post (x)↓. Proof. x ∈ pre(x↑) ⇔ ∃x : x  x ∧ x → x ⇔ x ∈ post (x)↓.   We have the following characterization of post # . Proposition 6. Fix a finite domain D, and an effective WSTS S = (X, δ, ). For every x ∈ D and P ∈ DPL(D): x ∈ post # (P ) ⇔ (x ∈ D ∧ ¬(pre(x↑) ⊆ (D \ P )↑)) . The sets pre(x↑)↑ and (D \ P )↑ are -uc-sets which have as finite minor set minpre(x) and (D\P ) respectively. Lemma 3.2 shows that if both minpre(x) and 58 P. Ganty, J.-F. Raskin, and L. Van Begin (D \ P ) are finite sets and  is decidable then we have an effective procedure to decide if pre(x↑)↑⊆ (D \ P )↑ which is equivalent to pre(x↑) ⊆ (D \ P )↑. Furthermore, since the complete lattice DPL(D) is finite, it follows that: Corollary 1. For any effective IWSTS S0 = (S, x0 ), and any finite domain D, ((post # [S, D])∗ ◦ α)(x0 ) can be effectively computed. 5 Domain Refinements In this section, we show that the abstract interpretation that we have defined previously can be made sufficiently precise to decide the coverability problem of (effective) IWSTS. We present two ways of achieving completeness of the abstract interpretation. Both are based on abstract domain refinement. The first (and naı̈ve) way is through enumeration of finite domains. The enumerating algorithm shows that completeness is achievable by systematically enlarging the finite domain D. The second algorithm, which is more sophisticated, enlarges the finite domain D using abstract counter-examples. 5.1 Enumerate Finite Domains In Sect. 3, we showed that any -dc-set can be represented using a well chosen domain (Lemma 6). In particular, the covering set can be represented using a finite domain D. Hereunder, Theorem 1 asserts that the abstract interpretation of an IWSTS S0 using a finite domain D that allows to represent exactly the covering set of S0 leads to the construction of that set. Theorem 1. Given Cover (S0 ), the covering set of an IWSTS S0 , and some finite domain D such that there is Θ ∈ DPL(D) : γ(Θ) = Cover (S0 ). For any P ∈ DPL(D) such that P  Θ we have (γ ◦ (post # )∗ )(P ) ⊆ Cover (S0 ). Proof. γ(Θ) = Cover (S0 ) by hypothesis ⇒ (post ◦ γ)(Θ) = post (Cover (S0 )) ⇒ (post ◦ γ)(Θ) ⊆ Cover (S0 ) monotonicity of post post (Cover (S0 )) ⊆ Cover (S0 ) ⇒ (α ◦ post ◦ γ)(Θ)  α(Cover (S0 )) by monotonicity of α ⇔ post (Θ)  α(Cover (S0 )) # ⇔ post # (Θ)  Θ def. of post # γ(Θ) = Cover (S0 ),Θ = (α ◦ γ)(Θ) (3) Since post # is a monotone function on a complete lattice, (3) shows that for any P  Θ we have ((post # )∗ )(P )  Θ ⇒ (γ ◦ (post # )∗ )(P ) ⊆ γ(Θ) # ∗ ⇔ (γ ◦ (post ) )(P ) ⊆ Cover (S0 ) monotonicity of γ by hypothesis   A Complete Abstract Interpretation Framework 59 Thanks to this proposition and the results of [14] Algorithm 1 decides the coverability problem for an effective IWSTS S0 = (S, x0 ) and a -uc-set bad. The main idea underlying the algorithm is to iteratively analyze an underapproximation of the reachable states (line 1) followed by an overapproximation (line 2). Positive instances of the coverability problem are decided by underapproximations and negative instances are decided by overapproximations. By enumeration of finite domains Di and Theorem 1, it is ensured that our abstract interpretation will eventually become precise enough for the negative instances. For this algorithm Algorithm 1. Enumeration Input: An IWSTS S0 = ((X, δ, ), x0 ) and a set bad ∈ UCS (X) for Di = D0 , D1 , . . . an enumeration of the finite subsets of X do 1 if ∃x0 , . . . , xk ∈ Di : x0 → . . . → xk ∧ xk ∈ bad then return reachable 2 else if (γ[Di ] ◦ (post # [S, Di ])∗ ◦ α[Di ])(x0 ) ∩ bad = ∅ then return unreachable end to be effective, we only need the (mild) additional assumption that elements of X are enumerable. In the next subsection, we show that this assumption can be dropped and propose a more sophisticated way to obtain a finite domain D which is precise enough to solve the coverability problem. Our refinement technique is based on the analysis of the states leading to bad. 5.2 Eliminate Overapproximations Leading to bad Let us first consider the following lemma that is a first step towards completeness. Lemma 9. Given a WSTS (X, δ, ) and a set bad ∈ UCS (X) fix a finite domain D and a set P  ∈ DPL(D) such that post # (P  )  P  and min(pre ∗ (bad)) ∩ γ(P  ) ⊆ D. For every P ∈ DPL(D) such that P  P  we obtain γ(P ) ∩ pre ∗ (bad) = ∅ ⇒ γ(post # (P )) ∩ pre ∗ (bad) = ∅. Proof. γ(P ) ∩ pre ∗ (bad) = ∅ ⇔ (↓ ◦post ◦ γ)(P ) ∩ pre ∗ (bad) = ∅ ⇒ (α ◦ post ◦ γ)(P ) ∩ pre ∗ (bad) = ∅ ⇔ post # (P ) ∩ pre ∗ (bad) = ∅ Lem. 8 and pre(pre ∗ (bad)) = pre ∗ (bad) (α ◦ post ◦ γ)(P ) ⊆ (↓ ◦post ◦ γ)(P ) def. of post # ⇔ pre ∗ (bad) ⊆ (X \ post # (P )) So we have established γ(P ) ∩ pre ∗ (bad) = ∅ ⇒ pre ∗ (bad) ⊆ (X \ post # (P )) . (4) 60 P. Ganty, J.-F. Raskin, and L. Van Begin Moreover, we conclude from P  P  that post # (P )  post # (P  ) (by monotonicity of post # ), hence that post # (P )  P  (post # (P  )  P  ) and finally that γ(post # (P )) ⊆ γ(P  ) (by monotonicity of γ). Now, let us consider γ(post # (P )): γ(post # (P )) = {c | c↓ ∩D ⊆ post # (P )} definition of γ = {c ∈ γ(P  ) | c↓ ∩D ⊆ post # (P )} γ(post # (P )) ⊆ γ(P ) ⊆ {c ∈ γ(P  ) | c↓ ∩min(pre ∗ (bad)) ∩ γ(P  ) ⊆ post # (P )}  ∗ = {c ∈ γ(P ) | c↓ ∩min(pre (bad)) ⊆ post (P )} # def. of D  c ∈ γ(P ) implies c↓⊆ γ(P  ) = {c ∈ γ(P  ) | c↓ ∩min(pre ∗ (bad)) ∩ (X \ post # (P )) = ∅} ⊆ {c ∈ γ(P  ) | c↓ ∩min(pre ∗ (bad)) ∩ pre ∗ (bad) = ∅} = {c ∈ γ(P  ) | c↓ ∩min(pre ∗ (bad)) = ∅} = {c ∈ γ(P  ) | {c} ∩ pre ∗ (bad) = ∅} By (4) min(A) ⊆ A if A ∈ UCS (X) Lem. 3.1 / pre ∗ (bad)} = {c ∈ γ(P  ) | c ∈ Hence, γ(post # (P )) ∩ pre ∗ (bad) = ∅.   Using the previous lemma and induction we can establish the following theorem. Theorem 2. Given a WSTS (X, δ, ) and a set bad ∈ UCS (X) fix a finite domain D and a set P  ∈ DPL(D) such that post # (P  )  P  and min(pre ∗ (bad)) ∩ γ(P  ) ⊆ D. For every I ⊆ X such that α(I)  P  , we have I ∩ pre ∗ (bad) = ∅ ⇔ (γ ◦ (post # )∗ ◦ α)(I) ∩ bad = ∅. We are nearly in position to define our refinement-based algorithm. We first define the following operator parametrized by O ⊆ X which is applied to a finite def subset of states T ⊆ X: minpre[S, O](T ) = minpre[S](T ) ∩ O.We also write minpre[O](T ) instead of minpre[S, O](T ) if the WSTS is clear from the context. In the remainder of this section we adopt the following convention: a set A acting as the argument of minpre should be read as min(A). A direct consequence of the definition of minpre is the following, for any O ⊆ O ⊆ X and A ⊆ X we have: minpre[O]∗ (A) ⊆ minpre[O ]∗ (A) . (5) The main ideas underlying our refinement-based algorithm (Algorithm 2) are as follows. In a first approximation, we consider a finite domain D0 that contains a minor set of bad. With this set, we compute a first overapproximation of the reachable states of S0 , noted O0 . If this overapproximation is fine enough to prove that we are in presence of a negative instance of the problem then we conclude at line 2. If it is not the case, we compute R0 that represents all the states within O0 that can reach bad in one step. If this set contains x0 then we conclude that bad is reachable. Otherwise, we refine the finite domain D0 into D1 to ensure at the next iteration that our overapproximation will be more precise (Prop. 5.2) and that (γ[D1 ] ◦ post # [S, D1 ] ◦ α[D1 ])(x0 )) will not intersect A Complete Abstract Interpretation Framework 61 with bad. So, we have excluded all spurious counter-examples of length one. We then proceed with this enlarged finite domain. Since min(pre ∗ (bad)) is computable, Theorem 2 intuitively shows that our algorithm terminates. We formally establish the correctness of our technique as stated in the next lemmas which prove soundness, completeness, and termination of Algorithm 2. Algorithm 2. Refinement loop Input: An IWSTS S0 and a set bad ∈ UCS (X) Let D0 ⊇ (min(bad)) for i = 0, 1, 2, . . . do 1 Compute Ri defined to be ((post # [S, Di ])∗ ◦ α[Di ])(x0 ) Let Oi denote γ[Di ](Ri ) 2 if Oi ∩ bad = ∅ then return unreachable else  i+1 k Compute Ri defined to be min 3 k=0 minpre[S, Oi ] (bad) 4 5 if {x0 } ∩ Ri ↑= ∅ then choose Di+1 ⊇ Di ∪ Ri else return reachable end end Lemma 10 (Soundness). If Algorithm 2 says “ reachable” then we have post ∗ (x0 ) ∩ bad = ∅. Proof. Let c be the value of variable i when the algorithm says “reachable”. minpre[Oc ]∗ (bad)↑⊆ minpre[X]∗ (bad)↑= pre ∗ (bad), the inclusion follows from Oc ⊆ X, (5) and ↑ is monotonic, and the equality follows from Lemma 4.b. Since {x0 }∩pre ∗ (bad) = ∅ iff post ∗ (x0 )∩bad = ∅, minpre[Oc ]c (bad)↑⊆ pre ∗ (bad) shows that post ∗ (x0 ) ∩ bad = ∅, by {x0 } ∩ minpre[Oc ]c (bad)↑= ∅ (line 4).   Lemma 11 (Completeness). If Algorithm 2 says “ unreachable” then we have post ∗ (x0 ) ∩ bad = ∅. Proof. Fix a finite domain D, by Lemma 7 we have that post ∗ (x0 ) ⊆ (γ ◦ (post # )∗ ◦ α)(x0 ). Let c be the value of variable i when the algorithm says “unreachable” at line 2. We conclude from (γ[Dc ] ◦ (post # [S, Dc ])∗ ◦ α[Dc ])   (x0 ) ∩ bad = ∅ that post ∗ (x0 ) ∩ bad = ∅ which is the desired conclusion. Lemma 12 (Termination). Given an effective IWSTS S0 and bad ∈ UCS (X), Algorithm 2 always terminates. Proof. It is routine to check that each domain Di is finite. Hence, since Di is computable because the Di ’s are finite and post # [S, Di ] is computable following Proposition 6 (notice that α[Di ](x0 ) is computable since  is assumed to be decidable), the fixpoint computation of line 1 finishes after a finite amount of time. Suppose, contrary to our claim, that the algorithm does not terminate. Since each line is evaluated in a finite amount of time, it follows that the algorithm executes the main loop infinitely many times. From line 5, we conclude that the 62 P. Ganty, J.-F. Raskin, and L. Van Begin algorithm considers an infinite sequence of finite domains D0 ⊆ D1 ⊆ · · · From Proposition 5.2, we know that O0 ⊇ O1 ⊇ · · · From Lemma 2, we conclude that there exists i ≥ 0 such that Oi = Oi+1 = · · · Let us consider the iteration i of the algorithm such that Oi = Oi+1 = · · ·  We have the infinite sequence Ri ↑⊆ Ri+1 ↑⊆ · · · . From Lemma 2, we conclude  that there exists j ≥ i such that Rj ↑= Rj+1 ↑= · · · . Hence, following line 5 of the algorithm, D contains min(minpre[Oi ]∗ (bad)) (or rather D contains equivalent states to those of min(minpre[Oi ]∗ (bad))) after the j th iteration. Let us now prove that (a) min(minpre[Oj+1 ]∗ (bad)) ≡ min(minpre[X]∗ (bad))∩ Oj+1 . Indeed, if it is not the case there exist l ≥ 0, c, c ∈ X such that c ∈ minpre[X]l (bad), c ∈ minpre[X](c), c ∈ Oj+1 and c ∈ Oj+1 . Hence, post (c ) ⊆ Oj+1 since post (c ) ∩ c↑= ∅ and Oj+1 is a -dc-set. But, ∀c ∈ Oj+1 : post (c) ⊆ Oj+1 . From this follows a contradiction. Moreover, (b) min((minpre[X]∗ (bad))↑) ≡ min(pre ∗ (bad)) holds by Lemma 4.b and by definition of ≡. We conclude, following line 5 of the algorithm, that Dj+1 contains equivalent states to those of min(pre ∗ (bad)) ∩ Oj+1 . By applying Theorem 2, we have {x0 } ∩ pre ∗ (bad) = ∅ iff Oj+1 ∩ bad = ∅. We consider two cases: (i) {x0 } ∩ pre ∗ (bad) = ∅, then we have Oj+1 ∩ bad = ∅ and the algorithm terminates since the test of line 2 is evaluated to true; (ii) {x0 } ∩pre ∗ (bad) = ∅, then Oj+1 ∩bad = ∅. Following (a) and (b) at line 3 of the  algorithm Rj+1 ≡ min(pre ∗ (bad))∩Oj+1 . Since {x0 }∩pre ∗ (bad) = ∅, there exists, on account of Lemma 3.3, x ∈ min(pre ∗ (bad)) : x0  x. x0 ∈ Oj+1 and Oj+1 ∈  ≡ min(pre ∗ (bad))∩Oj+1 DCS (X) shows that x ∈ Oj+1 . We conclude from Rj+1   that [x] ∩ Rj+1 = ∅, hence that {x0 } ∩ Rj+1↑= ∅, and finally that the test of line 4 is evaluated to false which yields the algorithm to terminate.   Remark 2. Let us notice that the practical efficiency of Algorithm 2 depends on (i) the preciseness of the overapproximations Oi and (ii) the time (and space) needed to build those overapproximations. Point (i) is crucial since rough approximations will lead to the computation of min(pre ∗ (bad)), which is time and space consuming in practice [23]. Point (ii) is important because an inefficient computation of overapproximations leads to an inefficient algorithm. Hence, a trade-off between (i) and (ii) must be chosen. This problem exceeds the scope of this paper and will be addressed in future works. To ensure termination we require, at line 5, that the finite domain is enlarged by, at least, the states of Ri . The algorithm remains correct if we add more states. 6 Illustrations We have produced a prototype that implements Algorithm 2. We describe in this section the execution of that prototype when applied on a toy example. The example of IWSTS S0 is represented through a Petri net (see [8] for details), depicted in Fig. 2, which models a very simple mutual exclusion protocol. We want to check for safety of the protocol, that is check that there is never more than one process in the critical sections. The markings that violates the property, denoted bad, are given by {0, 0, 0, 1, 1, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0}↑. It is worth A Complete Abstract Interpretation Framework t0 t3 t1 p3 p2 p1 (wait) t4 t2 p4 (cs1 ) p5 (cs2 ) 63 The processes (the tokens in place p1 ) can access some critical section (place p4 or p5 ) provided they acquired some lock (the tokens in places p2 and p3 ). The initial marking is given by 0, 1, 1, 0, 0. Transition t0 spawns processes. Fig. 2. A simple mutual exclusion protocol pointing that we want to establish the safety for any number of processes taking part in the protocol (recall that t0 spawns processes). Execution of the prototype. We describe the execution of the prototype iteration by iteration. On account of remark 2, we do not take min(bad) as initial finite domain but its downward closure instead and we do not add the set Ri to Di at the ith iteration but its downward closure instead. Taking the -downward closure of the sets allows us to efficiently prove the safeness of the protocol. Initialisation. As mentioned before, the initial value of the finite domain, which is referred as D0 , is given by {0, 0, 0, 1, 1, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0}↓. Iteration 1 (i=0). After the fixpoint computation of line 1, we have R0 = D0 , and so O0 = X. Hence the test of line 2 fails and we compute R0 = min(bad) ∪ {1, 1, 1, 0, 1, 1, 1, 1, 1, 0} which corresponds to min(bad ∪ pre(bad)). Because the test of line 4 fails, we execute line 5 and we set D1 to R0 ↓. Iteration 2 (i=1). The fixpoint computation of line 1 ends up with R1 = D1 , hence O1 = X. Again we perform a refinement step by (i) computing R1 = min(bad)∪{0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 2, 2, 1, 0, 0, 2, 1, 2, 0, 0} (which corresponds to min(bad ∪ pre(bad) ∪ pre 2 (bad))) and (ii) adding tuples of R1 ↓ with the ones of D1 to obtain D2 . Iteration 3 (i=2). The fixpoint computation of line 1 finishes with a set R2 such that the test of line 2 (O2 ∩ bad = ∅) succeeds and the system is proved to be safe. 5 Indeed O2 = {(p1 , p2 , p3 , p4 , p5 ) ∈ N5 | ( i=2 pi ≤ 1) ∧ p4 + p5 ≤ 1 ∧ p3 + p4 ≤ 1 ∧ p2 + p5 ≤ 1} which is equal to Cover (S0 ). Since Cover (S0 ) is, in general, not computable ([10]), the equality does always not hold. Notice that pre ∗ (bad) is computed in five iterations with the classical algorithm of [17]. Hence, the forward analysis allows to drastically cut the backward search. We hope this gain will appear also on many practical examples. References 1. Alur, R., Dill, D.: A theory of timed automata. Theoretical Computer Science 126 (1994) 183–236 2. Henzinger, T.A.: The theory of hybrid automata. In: Proceedings of LICS, IEEE Computer Society Press (1996) 278–292 64 P. Ganty, J.-F. Raskin, and L. Van Begin 3. Abdulla, P.A., Jonsson, B.: Verifying programs with unreliable channels. Inf. Comput. 127 (1996) 91–101 4. Abdulla, P., Annichini, A., Bouajjani, A.: Symbolic verification of lossy channel systems: Application to the bounded retransmission protocol. In: Proceedings of TACAS. Volume 1579 of LNCS., Springer (1999) 208–222 5. Delzanno, G., Raskin, J.F., Van Begin, L.: Towards the automated verification of multithreaded java programs. In: Proceedings of TACAS. Volume 2280 of LNCS., Springer (2002) 173–187 6. Bardin, S., Finkel, A., Leroux, J., Petrucci, L.: Fast: Fast acceleration of symbolic transition systems. In: Proceedings of CAV. Volume 2725 of LNCS., Springer (2003) 118–121 7. Esparza, J., Finkel, A., Mayr, R.: On the verification of broadcast protocols. In: Proceedings of LICS, IEEE Computer Society Press (1999) 352–359 8. Reisig, W.: Petri Nets. An introduction. Springer (1986) 9. Ciardo, G.: Petri nets with marking-dependent arc multiplicity: properties and analysis. In: Proc. of ATPN. Volume 815 of LNCS., Springer (1994) 179–198 10. Dufourd, C., Finkel, A., Schnoebelen, P.: Reset nets between decidability and undecidability. In: Proceedings of ICALP. Volume 1443 of LNCS., Springer (1998) 103–115 11. Raskin, J.F., Van Begin, L.: Petri nets with non-blocking arcs are difficult to analyse. In: Proceedings of INFINITY. Volume 96 of ENTCS., Elsevier (2003) 12. Emerson, E.A., Namjoshi, K.S.: On model checking for non-deterministic infinitestate systems. In: Proc. of LICS, IEEE Computer Society Press (1998) 70–80 13. Higman, G.: Ordering by divisibility in abstract algebras. Proc. London Math. Soc. (3) 2 (1952) 326–336 14. Geeraerts, G., Raskin, J.F., Van Begin, L.: Expand, Enlarge and Check: new algorithms for the coverability problem of WSTS. In: Proceedings of FSTTCS. Volume 3328 of LNCS., Springer (2004) 287–298 15. Abdulla, P., Deneux, J., Mahata, P., Nylen, A.: Forward reachability analysis of timed petri nets. In: Proceedings of Formats-FTRTFT. Volume 3253 of LNCS., Springer (2004) 343–362 16. Ganty, P., Raskin, J.F., Van Begin, L.: A complete abstract interpretation framework for coverability properties of WSTS. Technical Report 2005.57, Centre Fédéré en Vérification (CFV) (2005) 17. Abdulla, P.A., Cerans, K., Jonsson, B., Tsay, Y.K.: General decidability theorems for infinite-state systems. In: Proceedings of LICS, IEEE Computer Society Press (1996) 313–321 18. Delzanno, G., Raskin, J.F., Begin, L.V.: Covering sharing trees: a compact data structure for parameterized verification. Software Tools for Technology Transfer (STTT) 5 (2004) 268–297 19. Finkel, A., Schnoebelen, Ph.: Well-structured transition systems everywhere! Theoretical Computer Science 256 (2001) 63–92 20. Finkel, A.: Reduction and covering of infinite reachability trees. Inf. Comput. 89 (1990) 144–179 21. Esparza, J., Ganty, P., Schwoon, S.: Locality-based abstractions. In: Proceedings of SAS. Volume 3672 of LNCS., Springer (2005) 118–134 22. Bryant, R.E.: Graph-based algorithms for boolean function manipulation. IEEE Trans. Computers 35 (1986) 677–691 23. Van Begin, L.: Efficient Verification of Counting Abstractions for Parametric Systems. PhD thesis, Université Libre de Bruxelles (2003) Complexity Results on Branching-Time Pushdown Model Checking Laura Bozzelli Università di Napoli Federico II, Via Cintia, 80126 - Napoli, Italy Abstract. The model checking problem of pushdown systems (PMC problem, for short) against standard branching temporal logics has been intensively studied in the literature. In particular, for the modal µcalculus, the most powerful branching temporal logic used for verification, the problem is known to be Exptime-complete (even for a fixed formula). The problem remains Exptime-complete also for the logic CTL, which corresponds to a fragment of the alternation-free modal µ-calculus. However, the exact complexity in the size of the pushdown system (for a fixed CTL formula) is an open question: it lies somewhere between Pspace and Exptime. To the best of our knowledge, the PMC problem for CTL∗ has not been investigated so far. In this paper, we show that this problem is 2Exptime-complete. Moreover, we prove that the program complexity of the PMC problem against CTL (i.e., the complexity of the problem in terms of the size of the system) is Exptime-complete. 1 Introduction Model checking is a useful method to verify automatically the correctness of a system with respect to a desired behavior, by checking whether a mathematical model of the system satisfies a formal specification of this behavior given by a formula in a suitable propositional temporal logic. There are two types of temporal logics: linear and branching. In linear temporal logics, each moment in time has a unique possible future (formulas are interpreted over linear sequences corresponding to single computations of the system), while in branching temporal logics, each moment in time may split into several possible futures (formulas are interpreted over infinite trees, which describe all the possible computations of the system). The size of an instance of a model checking problem depends on two parameters: the size of the finite formal description of the given system and the size of the formula. In practice, the formula in normally very small, while the description of the system is often very large. Therefore, the complexity of the problem in terms of the size of the system (called program complexity) is very important in practice. Traditionally, model checking is applied to finite-state systems, typically modelled by labelled state-transition graphs. Recently, the investigation of model-checking techniques has been extended to infinite-state systems. An active field of research is model-checking of infinitestate sequential systems. These are systems in which each state carries a finite, but unbounded, amount of information e.g. a pushdown store. The origin of this E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 65–79, 2006. c Springer-Verlag Berlin Heidelberg 2006  66 L. Bozzelli research is the result of Muller and Schupp concerning the decidability of the monadic second-order theory of context-free systems [MS85]. This result can be extended to pushdown systems [Cau96], and it implies decidability of the model checking problem for all those logics (modal µ-calculus, CTL∗ , CTL, etc.) which have effective translations to the monadic second-order logic. As this general decidability result gives a non-elementary upper bound for the complexity of model checking, researchers sought decidability results of elementary complexity. Concerning pushdown systems, model checking with branching-time logics is quite hard. In particular, Walukiewicz [Wal96] has shown that model checking these systems with respect to modal µ-calculus, the most powerful branching temporal logic used for verification, is Exptime-complete. Even for a fixed formula in the alternation-free modal µ-calculus, the problem is Exptime-hard in the size of the pushdown system. The problem remains Exptime-complete also for the logic CTL [Wal00], which corresponds to a fragment of the alternation-free modal µ-calculus. However, the exact complexity in the size of the system (for a fixed CTL formula) is an open problem: it lies somewhere between Pspace and Exptime [BEM97]. In [Wal00], Walukiewicz has shown that even for the simple branching-time logic EF (a fragment of CTL) the problem is quite hard since it is Pspace-complete (even for a fixed formula). For other branching-time temporal logics such as EG, UB (which are fragments of CTL) and CTL∗ (which subsumes both CTL and LTL) the problem is still open. To the best of our knowledge, the pushdown model checking problem for CTL∗ has not been investigated so far. For standard linear temporal logics, model-checking pushdown systems with LTL and the linear-time µ-calculus is Exptime-complete [BEM97]. However, the problem is polynomial in the size of the pushdown system. It follows that the problem is only slightly harder than for finite-state systems, where it is Pspacecomplete but polynomial for any fixed formula [SC85, Var88]. For optimal pushdown model–checking algorithms, see also [EHRS00, EKS03, PV04, AEM04]. In this paper we study the pushdown model checking problem (PMC problem, for short) against CTL∗ and the program complexity of the PMC problem against CTL. In particular, we state the following two results: – The PMC problem against CTL∗ is 2Exptime-complete (and Exptimecomplete in the size of the system). – The program complexity of the PMC problem w.r.t. CTL is Exptimecomplete. In order to solve the PMC problem for CTL∗ we exploit an automata-theoretic approach. In particular, we propose an exponential time reduction (in the size of the formula) to the emptiness problem of alphabet-free alternating parity pushdown automata. The emptiness problem for this class of automata can be solved by a construction similar to that given in [KPV02] to solve the emptiness problem for nondeterministic parity pushdown tree automata (the algorithm in [KPV02] is based on a polynomial reduction to the emptiness of two-way alternating parity finite–state tree automata, which is known to be decidable in exponential time [Var98]). 2Exptime-hardness is shown by a technically non-trivial reduction from the word problem for Expspace–bounded alternating Turing Machines. Complexity Results on Branching-Time Pushdown Model Checking 67 Exptime-hardness of the pushdown model checking problem against CTL was shown by Walukiewicz [Wal00] using a reduction from the word problem for Pspace-bounded alternating Turing Machines. We use the basic ideas of the construction in [Wal00] in order to prove that the program complexity of the problem (i.e., assuming the CTL formula is fixed) is still Exptime-hard. 2 Preliminaries In this section we recall syntax and semantics of CTL∗ and CTL [EH86, CE81]. Also, we define pushdown systems and the model checking problem. CTL∗ and CTL logics. The logic CTL∗ is a branching–time temporal logic [EH86], where a path quantifier, E (“for some path”) or A (“for all paths”), can be followed by an arbitrary linear-time formula, allowing boolean combinations and nesting, over the usual linear temporal operators X (“next”), U (“until”), F (“eventually”), and G (“always”). There are two types of formulas in CTL∗ : state formulas, whose satisfaction is related to a specific state, and path formulas, whose satisfaction is related to a specific path. Formally, for a finite set AP of proposition names, the class of state formulas ϕ and the class of path formulas θ are defined by the following syntax: ϕ := prop | ¬ ϕ | ϕ ∧ ϕ | A θ | E θ θ := ϕ | ¬ θ | θ ∧ θ | Xθ | θ U θ where prop ∈ AP . The set of state formulas ϕ forms the language CTL∗ . The other operators can be introduced as abbreviations in the usual way: for instance, F θ abbreviates true U θ and Gθ abbreviates ¬F ¬θ. The Computation Tree Logic CTL [CE81] is a restricted subset of CTL∗ , obtained restricting the syntax of path formulas θ as follows: θ := Xϕ | ϕ U ϕ. This means that X and U must be immediately preceded by a path quantifier. The models for the logic CTL∗ are labelled graphs W, R, µ where W is a countable set of vertices, R ⊆ W × W is the edge relation, and µ : W → 2AP maps each vertex w ∈ W to the set of atomic propositions that hold in w. Such labelled graphs are called transition systems (TS, for short) here. In this context vertices are also called (global) states. For (w, w ) ∈ R, we say that w is a successor of w. A path is a (finite or infinite) sequence of vertices π = w0 , w1 , . . . such that (wi , wi+1 ) ∈ R for any i ≥ 0. We denote the suffix wi , wi+1 , . . . of π by π i , and the i-th vertex of π by π(i). A maximal path is either an infinite path or a finite path leading to a vertex without successors. Let G = W, R, µ be an TS, w ∈ W , and π be a maximal path of G. For a state (resp., path) formula ϕ (resp. θ), the satisfaction relation (G, w) |= ϕ (resp., (G, π) |= θ), meaning that ϕ (resp., θ) holds at state w (resp., holds along π) in G, is defined by induction. The clauses for proposition letters, negation, and conjunction are standard. For the other constructs we have: – (G, w) |= A θ iff for each maximal path π in G from w, (G, π) |= θ; – (G, w) |= E θ iff there exists a maximal path π from w such that (G, π) |= θ; 68 L. Bozzelli – (G, π) |= ϕ iff (G, π(0)) |= ϕ; – (G, π) |= Xθ iff π(1) is defined and (G, π 1 ) |= θ; – (G, π) |= θ1 U θ2 iff there exists i ≥ 0 such that (G, π i ) |= θ2 and for all 0 ≤ j < i, we have (G, π j ) |= θ1 . Pushdown systems. A pushdown system (PDS, for short) is a tuple S = AP, Γ, P, ∆, L, where AP is a finite set of proposition names, Γ is a finite stack alphabet, P is a finite set of (control) states, ∆ ⊆ (P × (Γ ∪ {γ0 })) × (P × Γ ∗ ) is a finite set of transition rules (where γ0 ∈ Γ is the stack bottom symbol ), and L : P × (Γ ∪ {γ0 }) → 2AP is a labelling function. A configuration is a pair (p, α) where p ∈ P is a control state and α ∈ Γ ∗ · γ0 is a stack content. For each (p, B) ∈ P × (Γ ∪ {γ0 }), we denote by nextS (p, B) the finite set (possibly empty) of the pairs (p , β) such that ((p, B), (p , β)) ∈ ∆. The size |S| of S is |P | + |∆|, with |∆| = ((p,B),(p ,β))∈∆ |β|. The semantics of an PDS S = AP, Γ, P, ∆, L is described by an TS GS = W, R, µ, where W is the set of pushdown configurations, for all (p, B · α) ∈ W with B ∈ Γ ∪ {γ0 }, µ(p, B · α) = L(p, B), and R is defined as follows: – ((p, B · α), (p , β)) ∈ R iff there is ((p, B), (p , β  )) ∈ ∆ such that either B ∈ Γ and β = β  · α, or B = γ0 (note that α = ε) and β = β  · γ0 (note that every transition that removes the bottom symbol γ0 also pushes it back). For each configuration w ∈ W , we denote by bdS (w) the number of successors of w (note that bdS (w) is finite). The pushdown model checking problem (PMC problem, for short) against CTL (resp., CTL∗ ) is to decide, for a given PDS S, an initial configuration w0 of S, and a CTL (resp., CTL∗ ) formula ϕ, whether (GS , w0 ) |= ϕ. 3 Tree Automata In order to solve the PMC problem for CTL∗ , we use an automata theoretic approach; in particular, we exploit the formalisms of Alternating Parity (finite– state) Tree automata (APT, for short) [MS87, EJ91] and Alphabet-free alternating parity pushdown automata (PD-APA, for short). Let N be the set of positive integers. A tree T is a subset of N∗ such that if i · x ∈ T for some i ∈ N and x ∈ N∗ , then also x ∈ T and for all 1 ≤ j < i, j · x ∈ T . The elements of T are called nodes and the empty word ε is the root of T . For x ∈ T , the set of children (or successors) of x (in T ) is children(T, x) = {i · x ∈ T | i ∈ N}. For x ∈ T , a (full) path π of T from x is a minimal set π ⊆ T such that x ∈ π and for each y ∈ π such that children(T, y) = ∅, there is exactly one node in children(T, y) belonging to π. For k ≥ 1, the (complete) k-ary tree is the tree {1, . . . , k}∗ . For an alphabet Σ, a Σ-labelled tree is a pair T, V  where T is a tree and V : T → Σ maps each node of T to a symbol in Σ. Note that T, V  corresponds to the labelled graph GT = T, R, V  where (x, y) ∈ R iff y ∈ children(T, x). If Σ = 2AP , then for a given CTL∗ formula ϕ over AP , we say that T, V  satisfies ϕ if (GT , ε) |= ϕ. Complexity Results on Branching-Time Pushdown Model Checking 69 For a set X, let B + (X) be the set of positive boolean formulas over X. Elements of X are called atoms. For Y ⊆ X and ψ ∈ B + (X), we say that Y satisfies ψ iff assigning true to all elements of Y and assigning false to all elements of X \ Y , makes ψ true. For k ≥ 1, we denote by [k] the set {1, . . . , k}. Alternating Parity (finite–state) Tree automata (APT). We describe APT over (complete) k-ary trees for a given k ≥ 1. Formally, an APT is a tuple A = Σ, Q, q0 , δ, F , where Σ is a finite input alphabet, Q is a finite set of states, q0 ∈ Q is an initial state, δ : Q × Σ → B + ([k] × Q) is a transition function, and F is a parity acceptance condition [EJ91], i.e., F = {F1 , . . . , Fm } is a sequence of subsets of Q, where F1 ⊆ F2 ⊆ . . . ⊆ Fm = Q (m is called the index of A). A run of A on a Σ-labelled k-ary tree T, V  (where T = [k]∗ ) is a labelled tree Tr , r in which each node is labelled by an element of T × Q. A node in Tr labelled by (x, q) describes a copy of the automaton that is in the state q and reads the node x of T . Note that many nodes of Tr can correspond to the same node of T . The labels of a node and its children (successors) have to satisfy the transition function. Formally, a run over T, V  is a T × Q-labelled tree Tr , r such that r(ε) = (ε, q0 ) and for all y ∈ Tr with r(y) = (x, q), the following holds: – there is a (possibly empty) set {(h1 , q1 ), . . . , (hn , qn )} ⊆ [k] × Q satisfying δ(q, V (x)) such that for each 1 ≤ j ≤ n, j · y ∈ Tr and r(j · y) = (hj · x, qj ). Note that several copies of the automaton may go to the same direction and that the automaton is not required to send copies to all the directions. The automaton A is symmetric if for each (q, σ) ∈ Q × Σ, δ(q, σ) is a positive boolean combination of sub-formulas (called generators) either of the form i=k i=k    i=1 (i, q ) or of the form i=1 (i, q ) (note that q is independent from the specific direction i). The size |A| of a symmetric APT A is |Q| + |δ| + |F | where  |δ| = (q,σ)∈Q×Σ |δ(q, σ)| and |δ(q, σ)| is the length of the formula obtained from δ(q, σ) considering each generator occurring in δ(q, σ) as an atomic proposition. For a run Tr , r over T, V  and an infinite path π ⊆ Tr , let infr (π) ⊆ Q be the set such that q ∈ infr (π) iff there are infinitely many y ∈ π such that r(y) ∈ T × {q}. For the parity acceptance condition F = {F1 , . . . , Fm }, π is accepting if there is an even 1 ≤ i ≤ m such that infr (π) ∩ Fi = ∅ and for all j < i, infr (π) ∩ Fj = ∅. A run Tr , r is accepting if all its infinite paths are accepting. The automaton A accepts an input tree T, V  iff there is an accepting run of A over T, V . The language of A, denoted L(A), is the set of Σ-labelled (complete) k-ary trees accepted by A. It is well-known that formulas of CTL∗ can be translated to tree automata. In particular, we are interested in optimal translations to symmetric APT. Lemma 1 ([KVW00]). Given a CTL∗ formula ϕ over AP and k ≥ 1, we can construct a symmetric APT of size O(2|ϕ| ) and index O(|ϕ|) that accepts exactly the set of 2AP -labelled complete k-ary trees satisfying ϕ.1 1 [KVW00] gives a translation from CTL∗ to Hesitant alternating tree automata which are a special case of parity alternating tree automata. 70 L. Bozzelli Alphabet-free alternating parity pushdown automata (PD-APA). An PD-APA is a tuple P = Γ, P, p0 , α0 , ρ, F , where Γ is a finite stack alphabet, P is a finite set of (control) states, p0 ∈ P is an initial state, α0 ∈ Γ ∗ ·γ0 is an initial stack content, ρ : P × (Γ ∪ {γ0 }) → B + (P × Γ ∗ ) is a transition function, and F = {F1 , . . . , Fm } is a parity acceptance condition over P . Intuitively, when the automaton P is in state p and the stack contains a word B · α ∈ Γ ∗ .γ0 , then P chooses a (possibly empty) finite set {(p1 , β1 ), . . . , (pn , βn )} ⊆ P × Γ ∗ satisfying ρ(p, B) and splits in n copies such that for each 1 ≤ j ≤ n, the j-th copy moves to state pj and updates the stack content by removing B and pushing βj . Formally, a run of P is a P × Γ ∗ .γ0 -labelled tree Tr , r such that r(ε) = (p0 , α0 ) and for all y ∈ Tr with r(y) = (p, B · α) and B ∈ Γ ∪ {γ0 }, the following holds: – there is a (possibly empty) finite set {(p1 , β1 ), . . . , (pn , βn )} ⊆ P × Γ ∗ satisfying ρ(p, B) such that for each 1 ≤ j ≤ n, j · y ∈ Tr and r(j · y) = (pj , βj · α) if B = γ0 , and r(j · y) = (pj , βj · γ0 ) otherwise (note that in this case α = ε). The notion of accepting path π ⊆ Tr is defined as for APT with infr (π) defined as follows: infr (π) ⊆ P is the set such that p ∈ infr (π) iff there are infinitely many y ∈ π for which r(y) ∈ {p} × Γ ∗ · γ0 . A run Tr , r is accepting if every infinite path π ⊆ Tr is accepting. The emptiness problem for PD-APA is to decide, for a given PD-APA, the existence of an accepting run. ∗ , the size of (p, α) is |α|. The size |ρ| of the transition For (p, α) ∈ P × Γ function is given by (p,B)∈P ×(Γ ∪{γ0 }) |ρ(p, B)| where |ρ(p, B)| is the sum of the sizes of the occurrences of atoms in ρ(p, B). In the following we are interested in the emptiness problem for PD-APA. In [KPV02], an optimal algorithm is given to solve the emptiness problem for nondeterministic parity pushdown tree automata. This algorithm is based on a polynomial reduction to the emptiness of two-way alternating parity tree automata, which is known to be decidable in exponential time [Var98]. By a similar reduction we obtain the following. Proposition 1. The emptiness problem for PD-APA with index m and transition function ρ is solvable in time exponential in m · |ρ|. 4 Upper Bound for CTL∗ We solve the PMC problem for CTL∗ using an automata theoretic approach. We fix an PDS S = AP, Γ, P, ∆, L, an initial configuration w0 = (p0 , α0 ) of S, and a CTL∗ formula ϕ. The unwinding of the TS GS = W, R, µ from w0 induces a W labelled tree TS , VS : the root of TS is associated with the initial configuration w0 , and each node x ∈ TS labelled by w ∈ W has bdS (w) successors, each associated with a successor w of w.2 In the following, we sometime view TS , VS  2 Assuming that W is ordered, there is indeed only a single such tree. Since CTL∗ formulas cannot distinguish between trees obtained by different orders, we do not lose generality by considering a particular order. Complexity Results on Branching-Time Pushdown Model Checking 71 as a 2AP -labelled tree, taking the label of a node x to be µ(VS (x)) instead of VS (x). Which interpretation is intended will be clear from the context. Evidently, (GS , w0 ) |= ϕ iff TS , VS  satisfies ϕ. Therefore, the model checking problem of S against ϕ can be reduced to check whether TS , VS  belongs to the language of the APT (whose existence is guaranteed by Lemma 1) accepting the tree-models of ϕ. However, note that the branching degree of TS is not uniform and, in particular, some nodes of TS may not have successors. We solve this problem as follows. Let k = max{bdS (w) | w ∈ W } (note that k is finite and can be trivially computed from the transition relation ∆ of S). We can encode the computation tree TS , VS  as a 2AP ∪{t} ∪ {⊥}-labelled complete kary tree (where ⊥ and t are fresh proposition names not belonging to AP ) in the following way: first, we add the proposition t to the label of all leaf nodes (corresponding to configurations without successors) of the tree TS ; second, for each node x ∈ TS with d children 1 · x, . . . , d · x (note that 0 ≤ d ≤ k), we add the children (d + 1) · x, . . . , k · x and label these new nodes with ⊥; finally, for each node x labelled by ⊥ we add recursively k children labelled by ⊥. Let [k]∗ , VS  be the tree thus obtained. Since a node labelled by ⊥ stands for a node that actually does not exist, we have to take this into account when we interpret CTL∗ formulas over the tree [k]∗ , VS . This means that we have to consider only the paths in this tree (called “legal” paths) that either never visit a node labelled by ⊥ or contain a terminal node (i.e. a node labelled by t). Note that a path is not “legal” iff it satisfies the formula ¬t U ⊥. In order to achieve this, we define inductively a function f : CTL∗ formulas → CTL∗ formulas such that f (ϕ) restricts path quantification to only “legal” paths: • • • • • f (prop) = prop for any proposition prop ∈ AP ; f (¬ϕ) = ¬f (ϕ); • f (ϕ1 ∧ ϕ2 ) = f (ϕ1 ) ∧ f (ϕ2 ); f (Eϕ) = E((G¬⊥) ∧ f (ϕ)) ∨ E((F t) ∧ f (ϕ)); f (Aϕ) = A((¬t U ⊥) ∨ f (ϕ)); f (Xϕ) = X(f (ϕ)∧¬⊥); • f (ϕ1 U ϕ2 ) = (f (ϕ1 )∧¬⊥) U (f (ϕ2 )∧¬⊥). Note that |f (ϕ)| = O(|ϕ|). By definition of f , it follows that TS , VS  satisfies ϕ (i.e., (GS , w0 ) |= ϕ) iff [k]∗ , VS  satisfies f (ϕ). Let Af (ϕ) = 2AP ∪{t} ∪ {⊥}, Q, q0 , δ, F , with F = {F1 , . . . , Fm }, be the symmetric APT (whose existence is guaranteed by Lemma 1) accepting exactly the 2AP ∪{t} ∪ {⊥}-labelled complete k-ary trees that satisfy f (ϕ). We have to check whether [k]∗ , VS  belongs to the language L(Af (ϕ) ). We reduce this problem to the emptiness of an PD-APA P = Γ, (P ∪ {⊥}) × Q, (p0 , q0 ), α0 , ρ, F  , which is defined as follows. The states of P consist either of pairs of states of S and states of Af (ϕ) , or pairs of the form (⊥, q) where q is a state of Af (ϕ) . Intuitively, when the automaton P is in state (p, q) ∈ P × Q with stack content α, and (p, α) is a configuration associated with some node x of TS , VS , then P simulates the behaviour of A starting from state q on the input tree given by the subtree of [k]∗ , VS  rooted at node x. Moreover, in state (⊥, q), P simulates the behaviour of A from state q on the input tree in which all nodes are labelled by ⊥. The parity acceptance condition F  is given by {(P ∪ {⊥}) × F1 , . . . , (P ∪ {⊥}) × Fm }. Finally, the transition function ρ is defined as follows: 72 L. Bozzelli – for each (p, q) ∈ P × Q and B ∈ Γ ∪ {γ0 }, ρ((p, q), B) is defined as follows. Let nextS (p, B) = {(p1 , α1 ), . . . , (pd , αd )} (note that 0 ≤ d ≤ k). If d > 0 (resp., d = 0), then ρ((p, q), B) is obtained from formula δ(q, L(p, B)) (resp., δ(q, L(p, B) ∪ {t})) by replacing each generator occurring in it of the form i=k i=d i=k    i=1 (i, q ) with i=1 ((pi , q ), αi ) ∨ i=d+1 ((⊥, q ), ε), and each generator i=k   i=d i=k of the form i=1 (i, q  ) with i=1 ((pi , q  ), αi ) ∧ i=d+1 ((⊥, q  ), ε); – for each q ∈ Q and B ∈ Γ ∪ {γ0 }, ρ((⊥, q), B) is obtained from formula i=k δ(q, ⊥) by replacing each generator occurring in it of the form Ci=1 (i, q  ),   i=k  where C ∈ { , }, with Ci=1 ((⊥, q ), ε). It is not hard to see that P has an accepting run iff [k]∗ , VS  ∈ L(Af (ϕ) ). Note that the size |ρ| of the transition function of P is bounded by k·|δ|·|∆|. By Lemma 1, it follows that P has index O(|ϕ|) and |ρ| is bounded by O(k · 2|ϕ| · ∆). Then, by Proposition 1 we obtain the main result of this section. Theorem 1. Given an PDS S = AP, Γ, P, ∆, L, a configuration w0 of S, and a CTL∗ formula ϕ, the model checking problem of S with respect to ϕ is solvable in time exponential in k · |∆| · 2|ϕ| with k = max{bdS (w) | w ∈ P × Γ ∗ · γ0 }. 5 Lower Bounds In this section we give lower bounds for the PMC problem against CTL∗ and for the program complexity of the PMC problem against CTL. The lower bound for CTL (resp., CTL∗ ) is shown by a reduction from the word problem for Pspace– bounded (resp., Expspace–bounded) alternating Turing Machines. Without loss of generality, we consider a model of alternation with a binary branching degree. Formally, an alternating Turing Machine (TM, for short) is a tuple M = Σ, Q, Q∀ , Q∃ , q0 , δ, F , where Σ is the input alphabet, which contains the blank symbol #, Q is the finite set of states, which is partitioned into Q = Q∀ ∪ Q∃ , Q∃ (resp., Q∀ ) is the set of existential (resp., universal) states, q0 is the initial state, F ⊆ Q is the set of accepting states, and the transition function δ is a mapping δ : Q × Σ → (Q × Σ × {L, R})2 . Configurations of M are words in Σ ∗ ·(Q×Σ)·Σ ∗ . A configuration η ·(q, σ)·η  denotes that the tape content is ηση  , the current state is q, and the reading head is at position |η| + 1. When M is in state q and reads an input σ ∈ Σ in the current tape cell, then it nondeterministically chooses a triple (q  , σ  , dir) in δ(q, σ) = (ql , σl , dirl ), (qr , σr , dirr ), and then moves to state q  , writes σ  in the current tape cell, and its reading head moves one cell to the left or to the right, according to dir. For a configuration c, we denote by succl (c) and succr (c) the successors of c obtained choosing respectively the left and the right triple in (ql , σl , dirl ), (qr , σr , dirr ). The configuration c is accepting if the associated state q belongs to F . Given an input x ∈ Σ ∗ , a computation tree of M on x is a tree in which each node corresponds to a configuration. The root of the tree corresponds to the initial configuration associated with x.3 A node that corresponds to a universal configuration (i.e., the associated state is in Q∀ ) has two 3 We assume that initially M’s reading head is scanning the first cell of the tape. Complexity Results on Branching-Time Pushdown Model Checking 73 successors, corresponding to succl (c) and succr (c), while a node that corresponds to an existential configuration (i.e., the associated state is in Q∃ ) has a single successor, corresponding to either succl (c) or succr (c). The tree is accepting if all its paths (from the root) visit an accepting configuration. An input x ∈ Σ ∗ is accepted by M if there exists an accepting computation tree of M on x. If M is Pspace–bounded (resp., Expspace–bounded), then there is a constant k ≥ 1 such that for each x ∈ Σ ∗ , the space needed by M on input x is bounded by k · |x| (resp., 2k·|x| ). It is well-known [CKS81] that Exptime (resp., 2Exptime) coincides with the class of all languages accepted by Pspace– bounded (resp., Expspace–bounded) alternating Turing Machines. Exptime-hardness of the pushdown model checking problem against CTL was shown by Walukiewicz [Wal00] using a reduction from the word problem for Pspace-bounded alternating Turing Machines. We use the basic ideas of the construction in [Wal00] in order to prove that the program complexity of the problem (i.e., assuming the CTL formula is fixed) is still Exptime-hard. Theorem 2. The program complexity of the PMC problem for CTL is Exptimehard. Proof. We show that there is a CTL formula ϕ such that given a Pspace– bounded alternating Turing Machine M = Σ, Q, Q∀ , Q∃ , q0 , δ, F  and an input x, it is possible to define an PDS S and a configuration w of S, whose sizes are polynomial in n = k · |x| and in |M|,4 such that M accepts x iff (GS , w) |= ϕ. Note that any reachable configuration of M over x can be seen as a word in Σ ∗ · (Q × Σ) · Σ ∗ of length exactly n. If x = σ1 . . . σr (where r = |x|), then the initial configuration is given by (q0 , σ1 )σ2 . . . σr ## . . . #.    n−r S guesses accepting computation trees of M starting from TM configurations of length n. The internal nodes of these trees are non-accepting configurations and the leaves are accepting configurations. The trees are traversed as follows. If the current non-accepting configuration c is universal, then S, first, will examine the subtrees associated with the left successor of c, and successively the subtrees associated with the right successor. If instead c is existential, then S will guess one of the two successors of c and, consequently, it will examine only the subtrees associated with this successor. In order to guess an accepting tree (if any) from a given configuration, S keeps track on the stack of the path from the root to the actual TM configuration by pushing the new guessed configurations and popping when backtracking along the accepting subtree guessed so far. Therefore, S accepts by empty stack. The stack alphabet of S is given by Σ ∪(Q×Σ)∪{∃l, ∃r , ∀l , ∀r } where ∃l and ∃r (resp., ∀l and ∀r ) are used to delimitate the left and right successors of an existential (resp., universal) configuration. The behaviour of S can be subdivided in three steps. 1. Generation of a TM configuration (operative phase) - S generates nondeterministically by push transitions a TM configuration c followed by a symbol 4 Where k ≥ 1 is a constant such that for each input y ∈ Σ ∗ , the space needed by M on input y is bounded by k · |y|. 74 L. Bozzelli in {∀l , ∃l , ∃r } on the stack, with the constraint that ∀l is chosen iff c is a universal configuration (i.e., the TM state q associated with c belongs to Q∀ ). In this phase, a (control) state of S has the form (gen, q, i, f lag), where q ∈ Q keeps track of the TM state associated with c, gen is a label identifying the current operation of S, i ∈ {0, . . . , n + 1} is used to ensure that c has exactly length n, and f lag ∈ {0, 1} is used to ensure that c ∈ Σ ∗ · (Q × Σ) · Σ ∗ . When S finishes to generate a TM configuration c followed by a symbol m ∈ {∀l , ∃l , ∃r }, i.e. S is in a state of the form (gen, q, n + 1, 1), then it chooses nondeterministically between two possible options. Choosing the first option, S goes to state cont, pops m from the stack, and performs Step 3 (see below). Choosing the second option, the behaviour of S depends on whether c is accepting. If c is not accepting (i.e., q ∈ / F ), then S guesses a successor of c going to a state of the form (gen, q  , 0, 0) for some q  ∈ Q without changing the stack content. Therefore, Step 1 is newly performed. If instead c is accepting (i.e., q ∈ F ), then S goes to state rem, pops m from the stack, and performs Step 2 (see below). 2. Removing a TM configuration (operative phase) - When S is in state rem, it removes deterministically by pop transitions the TM configuration c on the top of the stack (if any). After having removed c, if the symbol on the top of the stack, say B, belongs to {∀r , ∃l , ∃r } (this means intuitively that S has already generated a “pseudo” accepting computation tree for the TM configuration currently on the top of the stack), then S pops B from the stack and goes to state rem (i.e., Step 2.2 is newly performed). If instead B = ∀l , then S goes to a state of the form (gen, q  , 0, 0) for some q  ∈ Q and replaces ∀l with the symbol ∀r on the top of the stack. Therefore, Step 1 is newly performed. Finally, if B = γ0 (i.e., the stack is empty), then S goes to state f in and terminates its computation. 3. Checking δ-consistency (control phase)- When S is in state cont, it checks that one of the following holds: – the stack contains exactly one TM configuration. – the stack content has the form c · m · c · α where c and c are TM configurations and m ∈ {∃l , ∃r , ∀l , ∀r }. In the first case, S signals success by generating (by its finite control) the symbol good. In the second case, S signals success if and only if c is a TM successor of c in accordance with m, i.e.: c = succs (c) where s = l iff m ∈ {∃l , ∀l }. In order to understand how this can be done by using a number of states polynomial in n and |M|, let c = a1 . . . an . For each 1 ≤ i ≤ n, the value ai of the i-th cell of succl (c) (resp., succr (c)) is completely determined by the values ai−1 , ai and ai+1 (taking an+1 for i = n and a0 for i = 1 to be some special symbol, say “−”). As in [KTMV00], we denote by nextl (ai−1 , ai , ai+1 ) (resp., nextr (ai−1 , ai , ai+1 )) our expectation for ai (these functions can be trivially obtained from the transition function of M). Then, in state cont, S chooses nondeterministically between n states, cont1 , . . . , contn without changing the stack content. For each 1 ≤ i ≤ n, if S is in state conti , then first, it deterministically removes c · m from the stack, keeping track by its finite control of m and the i-th symbol ai of Complexity Results on Branching-Time Pushdown Model Checking 75 c . Successively, S deterministically removes c from the stack, keeping also track of the symbols ai−1 , ai , and ai+1 . Finally, S checks whether ai = nexts (ai−1 , ai , ai+1 ) with s = l iff m ∈ {∃l , ∀l }. If this condition is satisfied (and only in this case), then S generates the symbol good and terminates the computation. Formally, S = AP, Γ, P, ∆, L is defined as follows: – AP = {op, cont, good, f in} and Γ = Σ ∪ (Q × Σ) ∪ {∀l , ∀r , ∃l , ∃r }; – P = {good, f in, rem}∪PG ∪Pδ where PG = {(gen, q, i, f lag) | q ∈ Q, 0 ≤ i ≤ n + 1, f lag ∈ {0, 1}, f lag = 0 if i = 0 and f lag = 1 if i = n, n + 1} is the set of (control) states used in Step 1, and Pδ , which is used in Step 3, is given by {cont, cont1 , . . . , contn } ∪ {(conti , j, a) | 1 ≤ i, j ≤ n and a ∈ Σ ∪ (Q × Σ)} ∪{(conti , j, a, m, a1 , a2 , a3 ) | 1 ≤ i ≤ n, 0 ≤ j ≤ n, m ∈ {∀l , ∀r , ∃l , ∃r }, a, a1 , a2 , a3 ∈ Σ ∪ (Q × Σ) ∪ {−}, and a = −} – ((p, B), (p , β)) ∈ ∆ iff one of the following holds: • Step 1 (generation of a TM configuration) - If p ∈ PG , then: ∗ if p = (gen, q, i, f lag) and i < n, then β = B  B with B  ∈ Σ ∪ ({q} × Σ) and p = (gen, q, i + 1, f lag ). Moreover, if f lag = 1, then B  ∈ Σ and f lag  = 1; otherwise, f lag  = 0 iff B  ∈ Σ. ∗ if p = (gen, q, n, 1), then p = (gen, q, n + 1, 1) and β = B  B with B  = ∀l if q ∈ Q∀ , and B  ∈ {∃l , ∃r } otherwise. / F, ∗ if p = (gen, q, n + 1, 1), then or (1) β = ε and p = cont, or (2) q ∈ β = B ∈ Γ , and p = (gen, q  , 0, 0) for some q  ∈ Q, or (3) q ∈ F , β = ε, and p = rem. • Step 2 (Removing a TM configuration). If p = rem, then: ∗ if B ∈ Σ ∪ (Q × Σ) ∪ {∀r , ∃l , ∃r }, then β = ε and p = rem; ∗ if B = ∀l , then β = ∀r and p = (gen, q  , 0, 0) for some q  ∈ Q; ∗ if B = γ0 , then β = ε, and p = f in. • Step 3 (Checking δ-consistency). If p ∈ Pδ , then: ∗ if p = cont, then β = B and p = conti for some 1 ≤ i ≤ n. ∗ if p = conti , then B ∈ Σ ∪ (Q × Σ), β = ε, and p = (conti , 1, B); ∗ if p = (conti , j, a) and j < n, then B ∈ Σ ∪ (Q × Σ), β = ε, and p = (conti , j + 1, a ) where a = B if j = i − 1, and a = a otherwise; ∗ if p = (conti , n, a), then either B = γ0 , β = ε, and p = good, or B ∈ {∃l , ∀l , ∃r , ∀r }, β = ε, and p = (conti , 0, a, B, −, −, −); ∗ if p = (conti , j, a, m, a1 , a2 , a3 ) and j < n, then B ∈ Σ ∪ (Q × Σ), β = ε, and p = (conti , j + 1, a, m, a1 , a2 , a3 ) where for each 1 ≤ h ≤ 3, ah = B if j = i + h − 3, and ah = ah otherwise; ∗ if p = (conti , n, a, m, a1 , a2 , a3 ), then a = nexts (a1 , a2 , a3 ) where s = l if and only if m ∈ {∃l , ∀l }. Moreover, β = ε and p = good. – For all B ∈ Γ ∪{γ0 }, L(good, B) = {good}, L(f in, B) = {f in}, L(rem, B) = op, for all p ∈ PG , L(p, B) = {op}, and for all p ∈ Pδ , L(p, B) = {cont}. Let GS = W, R, µ. The correctness of the construction is stated by the following claim: 76 L. Bozzelli Claim. Given a TM configuration c with TM state q, there is an accepting computation tree of M over c iff there is a path of GS of the form π = w0 w1 . . . wn such that w0 = ((gen, q, n, 1), c · γ0 ), µ(wn ) = f in, and for each 0 ≤ i ≤ n−1, µ(wi ) = op and if wi has a successor wi such that µ(wi ) = cont, then each path from wi visits a state of the form (good, β). The condition in the claim above can be encoded by the following CTL formula ϕ := E op ∧ AX(cont → AF good) U f in (1) Let c0 be the initial TM configuration (associated with the input x). Then, by Claim 1 it follows that M accepts x iff (GS , w) |= ϕ where w = ((gen, q0 , n, 1), c0 ·γ0 ). Since ϕ is independent from M and n, and the sizes of |S| and w are polynomial in n and |M|, the assertion holds.  Theorem 3. Pushdown model checking against CTL∗ is 2Exptime-hard. Proof. Let M = Σ, Q, Q∀ , Q∃ , q0 , δ, F  be an Expspace–bounded alternating Turing Machine, and let k be a constant such that for each x ∈ Σ ∗ , the space needed by M on input x is bounded by 2k·|x| . Given an input x ∈ Σ ∗ , we define an PDS S, a configuration w0 = (p0 , γ0 ) of S, and a CTL∗ formula ϕ, whose sizes are polynomial in n = k·|x| and in |M|, such that M accepts x iff (GS , w0 ) |= ϕ. Some ideas in the proposed reduction are taken from [KTMV00], where there are given lower bounds for the satisfiability of extensions of CTL and CTL∗ . Note that any reachable configuration of M over x can be seen as a word in Σ ∗ · (Q × Σ) · Σ ∗ of length exactly 2n . If x = σ1 . . . σr (where r = |x|), then the initial configuration is given by (q0 , σ1 )σ2 . . . σr ## . . . #.    2n −r Each cell of a TM configuration is coded using a block of n symbols of the stack alphabet of S. The whole block is used to encode both the content of the cell and the location (the number of cell) on the TM tape (note that the number of cell is in the range [0, 2n − 1] and can be encoded using n bits). The stack (Σ ∪ (Q × Σ)) × 2{b,e,f,cn,l} where b is used alphabet is given by {∀l , ∀r , ∃l , ∃r } to mark the first element of a TM block, e (resp., f ) to mark the first (resp., the last) block of a TM configuration, cn to encode the number of cell, and l to mark a left TM successor. The behaviour of S is similar to that of the pushdown system defined in the proof of Theorem 2. The main differences can be summarized as follows: – Generation of a TM configuration (Step 1) When S generates nondeterministically a TM configuration c on the stack, it ensures that each block of c has length n and the symbols b, f , and e are used properly. Moreover, if c is generated as a successor of an other TM configuration, i.e. the stack content before generating c has the form m · α with m ∈ {∃l , ∃r , ∀l , ∀r }, then S ensures that the label l is used properly, i.e. any element of c is marked by l iff m ∈ {∃l , ∀l }. However, S does not ensure the the cell numbers of c are encoded properly (indeed, this would require a number of control states exponential in n). Complexity Results on Branching-Time Pushdown Model Checking 77 – Generation of the initial TM configuration - Starting from the global state w0 = (p0 , γ0 ), S, first, generates the encoding of the initial TM configuration c0 (associated with the input x) on the stack. Note that S ensures that c0 has the form (q0 , σ1 )σ2 . . . σr ## . . .. However, S does not ensure that the number of blanks to the right of σr is exactly 2n − r. – Checking δ-consistency - As for the pushdown system defined in the proof of Theorem 2, after having generated a TM configuration on the stack, S can choose nondeterministically to go to the (control) state cont. When S is in state cont, it chooses nondeterministically between two options cont1 and cont2 (without changing the stack content). Assume that the stack content has the form c · α where c is a “pseudo” TM configuration generated in Step 1, and either α is empty or it has the form m · c · α where m ∈ {∃l , ∃r , ∀l , ∀r } and c is a “pseudo” TM configuration. Then, choosing option cont1 , S removes deterministically (by pop transitions) c from the stack and terminates its computation. The computation tree T, V  of GS rooted at the global state associated with cont1 reduces to a finite path π (corresponding to the configuration c). We use a CTL∗ formula ϕ1 on this tree T, V  in order to require that the cell numbers of c are encoded correctly (this also implies that the number of blocks of c is exactly 2n ). For each node u ∈ π, let cn(u) be the truth value (1 for true and 0 for f alse) of the proposition cn in u. Let us consider two consecutive TM blocks u1 . . . un u1 . . . un along π, and let k (resp., k  ) be the number of cell of the first block (resp., the second block), i.e., the integer whose binary code is given by cn(u1 ) . . . cn(un ) (resp., cn(u1 ) . . . cn(un )). We have to require that k  = (k + 1) mod 2n , and k = 0 (resp., k  = 2n − 1) if u1 . . . un corresponds to the first block of c, i.e. u1 is labelled by proposition e (resp., u1 . . . un corresponds to the last block of c, i.e. u1 is labelled by proposition f ). Therefore, ϕ1 is defined as follows:       n−1 n−1 (b ∧ f ) → j=0 (AX)j cn AG (b ∧ e) → j=0 (AX)j ¬cn n−1 (b ∧ ¬f ) −→ j=0 [(AX)j (¬cn ∧ (AX)n cn) ∧    i n i n i>j (AX) (cn ∧ (AX) ¬cn) ∧ i<j (AX) (cn ↔ (AX) cn)] Choosing the second option cont2 , S, first, removes deterministically c from the stack with the additional ability to generate (by its finite control) the symbol check1 . Successively, assuming that α has the form m·c ·α , S removes m · c from the stack (by pop transitions) and simultaneously generates (by its finite control) at most at one block of c the symbol check2 . After this operation, S terminates its computation. Let T, V  be the computation tree of GS rooted at the global state associated with cont2 . If α is empty, then by construction, T reduces to a finite path labelled by proposition check1 and corresponding to configuration c. If instead α has the form m · c · α , then each path (from the root) of T consists of a sequence of nodes corresponding to c labelled by check1 followed by a sequence of nodes corresponding to c with at most one block labelled by check2 . This allows us to define a CTL∗ formula ϕ2 , asserted on the tree T, V , (whose size is polynomial in n and |M|) in order to require that in the case α is not empty (i.e., α has the form 78 L. Bozzelli m · c · α ), c is a TM successor of c in accordance with m, i.e. c = succs (c ) where s = l iff m ∈ {∃l , ∀l } (note that by Step 1, m ∈ {∃l , ∀l } iff c is marked by symbol l). Formula ϕ2 is defined as follows: AG(¬check2 ) ∨ AG((check1 ∧ b) → E(θ1 ∧ θ2 )) where the path formulas θ1 and θ2 are defined below. Note that the subformula AG(¬check2 ) manages the case in which α is empty. In the other case, we require that for each node u ∈ T labelled by check1 and b, i.e. associated with the first element of a block bl of c, there is a path π from u satisfying the following two properties: 1. π visits a node labelled by check2 and b, i.e. associated with the first element of a block bl of c , such that bl and bl have the same number of cell. This requirement is specified by the path formula θ1 : θ1 := ψ1 ∧ X(ψ2 ∧ X(ψ3 ∧ . . . X(ψn ) . . .)) where for each 1 ≤ j ≤ n, ψj is defined as follows     cn → F (check2 ∧ b ∧X j−1 cn) ∧ ¬cn → F (check2 ∧ b ∧X j−1 ¬ cn)  the Σ  -value of a 2. Let Σ  := Σ ∪ (Q × Σ) and let us denote by σ(bl)  TM block bl. By construction and Property 1 above, there is exactly one node of π that is labelled by check2 and b. Moreover, by Property 1 this node is associated with a TM block bl of c having the same number of cell of bl. Therefore, we have to require that σ(bl) = nexts (σ(blprec ), σ(bl ), σ(blsucc )) where blprec and blsucc represent the blocks soon before and soon after bl along π, and s = l iff the TM configuration c is a left TM successor (i.e. all nodes of bl are labelled by proposition l). This requirement is expressed by the path formula θ2 . We distinguish three cases depending on whether bl corresponds to the first block, to the last block or to a non-extremal block of the associated T M configuration c. For simplicity, we consider only the case in which bl is a non-extremal block. The other cases can be handled similarly. (¬f ∧ ¬e) −→ θ2 :=    n b ∧ check2 ∧ (X)n σ3 ))  σ1 ,σ2 ,σ3 ∈Σ  F (σ1 ∧ (X) (σ2 ∧  (¬l → nextr (σ1 , σ2 , σ3 )) (l → nextl (σ1 , σ2 , σ3 )) Finally, formula ϕ is obtained from formula (1) in the proof of Theorem 2 by replacing  good) in (1) with the formula  the subformula AX(cont → AF  AX cont → EX(cont1 ∧ ϕ1 ) ∧ EX(cont2 ∧ ϕ2 ) . Now, we can prove the main result of this paper. Theorem 4. (1) The program complexity of the PMC problem for CTL is Exptime-complete. (2) The PMC problem for CTL∗ is 2Exptime-complete. The program complexity of the problem is Exptime-complete. Proof. Claims 1 follows from Theorem 2 and the fact that model-checking pushdown systems against CTL is known to be Exptime-complete [Wal00], while Claim 2 directly follows from Theorems 1 and 3, and Claim 1.  Complexity Results on Branching-Time Pushdown Model Checking 79 References [AEM04] R. Alur, K. Etessami, and P. Madhusudan. A temporal Logic of Nested Calls and Returns. In TACAS’04, pages 467–481, 2004. [BEM97] A. Bouajjani, J. Esparza, and O. Maler. Reachability Analysis of Pushdown Automata: Application to Model-Checking. In CONCUR’97, LNCS 1243, pages 135–150. Springer-Verlag, 1997. [Cau96] D. Caucal. On infinite transition graphs having a decidable monadic theory. In Proc. the 23th International Colloquium on Automata, Languages and Programming (ICALP’96), LNCS 1099, pages 194–205. SpringerVerlag, 1996. [CE81] E.M. Clarke and E.A. Emerson. Design and verification of synchronization skeletons using branching time temporal logic. In Proceedings of Workshop on Logic of Programs, LNCS 131, pages 52–71. Springer-Verlag, 1981. [CKS81] A.K. Chandra, D.C. Kozen, and L.J. Stockmeyer. Alternation. Journal of the ACM, 28(1):114–133, 1981. [EH86] E.A. Emerson and J.Y. Halpern. Sometimes and not never revisited: On branching versus linear time. Journal of the ACM, 33(1):151–178, 1986. [EHRS00] J. Esparza, D. Hansel, P. Rossmanith, and S. Schwoon. Efficient algorithms for model checking pushdown systems. In CAV’00, LNCS 1855, pages 232–247. Springer-Verlag, 2000. [EJ91] E.A. Emerson and C.S. Jutla. Tree automata, µ-calculus and determinacy. In FOCS’91, pages 368–377, 1991. [EKS03] J. Esparza, A. Kucera, and S. Schwoon. Model checking LTL with regular valuations for pushdown systems. Inf. Comput., 186(2):355–376, 2003. [KPV02] O. Kupferman, N. Piterman, and M.Y. Vardi. Pushdown specifications. In LPAR’02, LNCS 2514, pages 262–277. Springer-Verlag, 2002. [KTMV00] O. Kupferman, P.S. Thiagarajan, P. Madhusudan, and M.Y. Vardi. Open systems in reactive environments: Control and Synthesis. In CONCUR’00, LNCS 1877, pages 92–107. Springer-Verlag, 2000. [KVW00] O. Kupferman, M.Y. Vardi, and P. Wolper. An Automata-Theoretic Approach to Branching-Time Model Checking. Journal of the ACM, 47(2):312–360, 2000. [MS85] D.E. Muller and P.E. Shupp. The theory of ends, pushdown automata, and second-order logic. Theoretical Computer Science, 37:51–75, 1985. [MS87] D.E. Muller and P.E. Shupp. Alternating automata on infinite trees. Theoretical Computer Science, 54:267–276, 1987. [PV04] N. Piterman and M.Y. Vardi. Global model-checking of infinite-state systems. In CAV’04, LNCS 3114, pages 387–400. Springer-Verlag, 2004. [SC85] A.P. Sistla and E.M. Clarke. The complexity of propositional linear temporal logics. Journal of the ACM, 32(3):733–749, 1985. [Var88] M.Y. Vardi. A temporal fixpoint calculus. In POPL’88, pages 250–259. ACM Press, 1988. [Var98] M.Y. Vardi. Reasoning about the past with two-way automata. In ICALP’98, LNCS 1443, pages 628–641. Springer-Verlag, 1998. [Wal96] I. Walukiewicz. Pushdown processes: Games and Model Checking. In CAV’96, LNCS 1102, pages 62–74. Springer-Verlag, 1996. [Wal00] I. Walukiewicz. Model checking CTL properties of pushdown systems. In FSTTCS’00, LNCS 1974, pages 127–138. Springer-Verlag, 2000. A Compositional Logic for Control Flow Gang Tan1 and Andrew W. Appel2 1 2 Computer Science Department, Boston College gtan@cs.bc.edu Computer Science Department, Princeton University appel@cs.princeton.edu Abstract. We present a program logic, Lc , which modularly reasons about unstructured control flow in machine-language programs. Unlike previous program logics, the basic reasoning units in Lc are multipleentry and multiple-exit program fragments. Lc provides fine-grained composition rules to compose program fragments. It is not only useful for reasoning about unstructured control flow in machine languages, but also useful for deriving rules for common control-flow structures such as while-loops, repeat-until-loops, and many others. We also present a semantics for Lc and prove that the logic is both sound and complete with respect to the semantics. As an application, Lc and its semantics have been implemented on top of the SPARC machine language, and are embedded in the Foundational Proof-Carrying Code project to produce memory-safety proofs for machine-language programs. 1 Introduction Hoare Logic [1] has long been used to verify properties of programs written in high-level programming languages. In Hoare Logic, a triple {p}s{q} describes the relationship between exactly two states—the normal entry and exit states— associated with a program execution. That is, if the state before execution of s satisfies the assertion p, then the state after execution satisfies q. For a high-level programming language with structured control flow, a program logic based on Hoare triples works fine. However, programs in high-level languages are compiled into machine code to execute. Since it is hard to prove that a compiler with complex optimizations produces correct machine code from verified high-level-language programs, substantial research effort [2, 3, 4] during recent years has been devoted to verifying properties directly at the machine-language level. Machine-language programs contain goto statements with unrestricted destinations. Therefore, a program fragment or a collection of statements possibly contains multiple exits and multiple entries to which goto statements might jump. In Hoare Logic, since a triple {p}s{q} is tailored to describe the relationship between the normal entry and the normal exit states, it is not surprising that trouble arises in considering program fragments with more than one entry/exit. To address the problem of reasoning about control flow in machine-language programs, this paper makes two main contributions: E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 80–94, 2006. c Springer-Verlag Berlin Heidelberg 2006  A Compositional Logic for Control Flow 81 – We propose a program logic, Lc , which modularly reasons about machinelanguage program fragments. Its basic reasoning units are multiple-entry and multiple-exit program fragments. The logic composes program fragments in a set of fine-grained composition rules. As a result, Lc is more modular than previous program logics for control flow. – We also develop for Lc a semantics. We will show that a naive semantics does not work. We need to use a semantics based on approximations of counting computation steps. Based on this semantics, soundness and (relative) completeness of Lc are proved. Before a full technical development, we present an overview of Lc and its related work. Overview of Lc . Two design features give Lc its modularity: its judgment (form of specification) and its composition rules. The judgment in Lc is directly on multiple-entry and multiple-exit program fragments. For example, Lc treats a conditional-branch statement “if b goto l” as a one-entry and two-exit fragment. Lc then provides for “if b goto l” a rule, which associates the entries and exits with appropriate invariants, depicted as follows: p if b goto l p ∧ ¬b l p∧ b The above graph associates the invariant p with the entry, and associates p ∧ ¬b and p ∧ b with two exits, respectively. As a note to our convention, we put invariants on the right of edges; we put labels, when they exist, on the left. Lc also provides a set of inference rules to compose judgments on program fragments. These inference rules reason about control flow in smaller steps than Hoare Logic. For example, to reason about while loops, Hoare Logic provides a while rule: {p ∧ b}s{p} while {p}while b do s{p ∧ ¬b} l : if ¬b goto l ; l1 : s; l2 : goto l l : (1) A while loop, however, is a high-level language construct. When mapped to machine code, it is implemented by a sequence of more primitive statements. One implementation is shown on the right of the previous figure. Since the implementation contains unstructured control flow, Hoare logic cannot reason about it. In contrast, our logic can treat each statement in the implementation as a multiple-entry and multiple-exit fragment. Using its composition rules, the logic can combine fragments and eliminate intermediate entries and exits. In the end, from its composition rules, the logic can derive the Hoare-logic rule for while loops. Furthermore, it can derive the rules for sequential composition, 82 G. Tan and A.W. Appel repeat-until loops, if-then-else statements, and many other structured controlflow constructs. Therefore, our logic can recover structured control flow, when present, in machine-language programs. Related work on program logics for goto statements. Many researchers have also realized the difficulty of verifying properties of programs with goto statements in Hoare Logic [5, 6, 7, 8, 9]. Some of them have proposed improvements over Hoare Logic. Almost all of these works are at the level of high-level languages. They treat while loops as a separate syntactic construct and have a rule for it. In comparison, Lc derives rules for control-flow structures. These previous works also differ from Lc in terms of the form of specification. The work by de Bruin [8] is a typical example. In his system, the judgment for a statement s is: (2) L1 : p1 , . . . , Ln : pn |{p}s{q}, where L1 , . . . , Ln are labels in a program P ; the assertion pi is the invariant associated with the label Li ; the statement s is a part of the program P . Judgment (2) judges a triple {p}s{q}, but under all label invariants in a program. By explicitly supplying invariants for labels in the judgment, de Bruin’s system can handle goto statements, and its rule for goto statements is L1 : p1 , . . . , Ln : pn |{pi }goto Li {false}. Judgment (2) is sufficient for verifying properties of programs with goto statements. Typed Assembly Language (TAL [3]) by Morrisett et al. uses a similar judgment to verify type safety of assembly-language programs. However, judgment (2) assumes the availability of global information, because it judges a statement s under all label invariants of a program— L1 : p1 , . . . , Ln : pn . Consequently, it is impossible for de Bruin’s system or TAL to compose fragments with different sets of global label invariants. We believe that a better form of specification should judge s under only those label invariants associated with exits in s. This new form of specification makes fewer assumptions (fewer label invariants) about the rest of the program and is more modular. Floyd’s work [10] on program verification associates a predicate for each arc in the flowchart representation of a program. The program is correct if each statement in the program has been verified correct with respect to the predicates associates with the entry and exit arcs of the statement. In Floyd’s system, however, the composition of statements is based on flowcharts and is informal, and it has no principles for eliminating intermediate arcs. Our Lc provides formal rules for composing statements. When verifying properties of goto statements and labels, Floyd’s system also assumes the availability of the complete program. Cardelli proposed a linking logic [11] to formalize program linking. Glew and Morrisett [12] defined a modular assembly language to perform type-safe linking. Our logic is related to these works because exit labels can be thought as imported labels in a module, and entry labels as exported labels. In some sense, we apply the idea of modular linking to verification of machine code. But since we are more concerned with program verification, we also provide a semantics for our logic, and prove it is both sound and complete. A Compositional Logic for Control Flow 83 Recent works by Benton [13], Ni and Shao [14], and Saabas and Usstalu [15] define compositional program logics for low-level machines; their systems also reason modularly about program fragments and linking. To deal with procedure calls and returns, Benton uses Hoare-style pre- and postconditions. Since our compiler uses continuation-passing style, so can our calculus; therefore our labels need only preconditons. The rest of this paper is organized as follows. Section 2 presents the logic Lc on a simple imperative language that has unstructured control flow. In Section 3, we develop a semantics for the logic. The soundness and completeness theorems are then presented. In Section 4, we briefly discuss the implementation and the role of Lc in the Foundational Proof-Carrying Code project [4]. In Section 5, we conclude and discuss future work. A more detailed treatment of the logic, its semantics, and its applications can be found in the first author’s PhD thesis [16]. 2 Program Logic Lc We present Lc on a simple imperative language. Figure 1 presents the syntax of the language. Most of the syntax is self-explanatory, and we only stress a few points. First, since the particular set of primitive operators and relations does not affect the presentation, the language assumes a class of operator symbols, OPSym, and a class of relation symbols, RSym. For concreteness, OPSym could be {+, ×, 0, 1} and RSym could be {=, <}. Second, boolean operators do not include standard constructors such as false, ∧ and ⇒; they can be defined by true, ∨ and ¬. The language in Fig. 1 is tailored to imitate a machine language. The destination of a goto statement is unrestricted and may be a label in the middle of a loop. Furthermore, the language does not have control structures such as if b then s, and while b do s. These control structures are implemented by a sequence of primitive statements. To simplify the presentation, the language in Fig. 1 differs from machine languages in several aspects. It uses abstract labels while machine languages use concrete addresses. This difference does not affect the results of Lc . The language also lacks indirect jumps (jump through a variable), pc-relative jumps, and procedure calls. We will discuss in Section 4 how we deal with these features. operator symbols relation symbols variables labels primitive statements statements expressions boolean expressions OPSym op RSym re Var x, y, z Label l PrimStmt t Stmt s Exp e BExp b ::= ::= ::= ::= x := e | goto l | if b goto l t | l : s | (s1 ; s2 ) x | op(e1 , . . . , ear (op) ) true | b1 ∨ b2 | ¬b | re(e1 , . . . , ear (re) ) Fig. 1. Language syntax, where ar (op) is the arity of the symbol op 84 2.1 G. Tan and A.W. Appel Syntax and Rules of Lc The syntax of Lc is in Fig. 2. Program fragments. A program fragment, l : (t) : l , is a primitive statement t with a start label l and an end label l . The label l identifies the left side of t, the normal entry, and l identifies the right side of t, the normal exit. We also use l1 : (s1 ; s2 ) : l3 as an abbreviation for two fragments: l1 : (s1 ) : l2 and l2 : (s2 ) : l3 , where l2 is a new label. We use the symbol F for a set of fragments. fragments fragment sets assertions label-continuation sets Fragment FragSet Assertion LContSet f F p Ψ ::= ::= ::= ::= l : (t) : l {l1 : (t1 ) : l1 , . . . , ln : (tn ) : ln } true | p1 ∨p2 | ¬p | re(e1 , . . . , ear (re) ) | ∃x.p {l1  p1 , . . . , ln  pn } Fig. 2. Lc : Syntax Assertions and label continuations. Assertions are meant to describe predicates on states. Lc can use any assertion language. We use first-order classical logic in this presentation (see Fig. 2). This assertion language is a superset of the language of boolean expressions. We omit conjunction and universal quantifiers since they can be defined by other constructors classically. Lc is parametrized over a deduction system, D, which derives true formulas in the assertion language. We leave the rules of D unspecified, and assume that its judgment is D p, which is read as that p is a true formula. A label identifies a point in a program. To associate assertions with labels, Lc uses the notation, lp, pronounced “l with p”. In Hoare Logic, when an assertion p is associated with a label l in a verified program, then whenever the control of the program reaches l, the assertion p is true on the current state. In Lc , we interpret l  p in a different way: If l  p is true in a program, then whenever p is satisfied, it is safe to continue from l (or, jump to l). Therefore, we call p a precondition of the label l, and call l  p a label continuation. We use the symbol Ψ for a set of label continuations. Form of specification. In Lc , the judgment to specify properties of multiple-entry and multiple-exit program fragments has the syntax: F ; Ψ   Ψ, where F is a set of program fragments; Ψ  and Ψ are sets of label continuations. We next explain this judgment. Suppose  Ψ  = {l1  p1 , . . . , lm  pm }, and Ψ = {l1  p1 , . . . , ln  pn }.  in Ψ  are exits of F , and l1 , . . . , ln in Ψ are entries of F . The Labels l1 , . . . , lm following graph depicts the relationship between F , Ψ , and Ψ  : A Compositional Logic for Control Flow l1 p1 ln pn 85 ψ F l′1 p′1 lm′ pm′ ψ′ With this relationship in mind, an informal interpretation of the judgment F ; Ψ   Ψ is as follows: for a set of fragments F , if it is safe to continue from any of the exit labels, provided that the associated assertion is true, then it is safe to continue from any of the entry labels, provided that the associated assertion is true. This interpretation draws conclusions on entry labels based on assumptions on exit labels. Note, however, this interpretation is simplified and the precise interpretation we will adopt for F ; Ψ   Ψ in Section 3 has an additional requirement: It takes at least one computation step from an entry to reach an exit. We ignore this issue for now and will come back to it. Using this judgment, Lc provides rules for primitive statements. For example, Fig. 3 provides a rule for the assignment statement. In the assignment rule, the fragment, l : (x := e) : l , has one entry, namely l, and one exit, namely l . The assignment rule states that if it is safe to continue from the exit l , when p is true, then it is safe to continue from the entry l, when p[e/x] is true. The reason why it is safe to continue from l can be informally established as follows: Suppose we start from l in an initial state where the next statement to execute is x := e and p[e/x] is true; the new state after the execution of the statement reaches the exit l , and based on the semantics of x := e, the assertion p is true; since we assume it is safe to continue from l , when p is true, the new state is safe to continue; hence, the initial state can safely continue from l, when p[e/x] is true. In Hoare Logic, the assignment rule is {p[e/x]} x := e {p}. This is essentially the same as the assignment rule in Lc . In general, for any statement s that has only the normal entry and the normal exit, a Hoare triple {p}s{q} has in Lc a corresponding judgment: {l : (s) : l } ; {l  q}  {l  p}. But unlike Hoare triples, F ; Ψ   Ψ is a more general judgment, which is on multiple-entry and multiple-exit fragments. This capability is used in the rule for conditional-branch statements, if b goto l1 , in Fig. 3. A conditional-branch statement has two possible exits. Therefore, the if rule assumes two exit label continuations. Composition rules. The strength of Lc is its composition rules. These rules can compose judgments on individual statements to form properties of the combined statement. By internalizing control flow of the combined statement, these composition rules allow modular reasoning. Figure 3 shows Lc ’s composition rules. We illustrate these rules using the example in Fig. 4. The figure uses informal graphs, but they can be translated into formal syntax of Lc without much effort. 86 G. Tan and A.W. Appel F ; Ψ1  Ψ2 {l : (x := e) : l } ; {l  p}  {l  p[e/x]} assignment {l : (goto l1 ) : l } ; {l1  p}  {l  p} goto {l : (if b goto l1 ) : l } ; {l1  p ∧ b, l  p ∧ ¬b}  {l  p} F1 ; Ψ1  Ψ1 F2 ; Ψ2  Ψ2 combine F1 ∪ F2 ; Ψ1 ∪ Ψ2  Ψ1 ∪ Ψ2  Ψ1 ⇒ Ψ2  Ψ1 ⇒ Ψ2 if F ; Ψ  ∪ {l  p}  Ψ ∪ {l  p} F ; Ψ2  Ψ2 F ; Ψ1  Ψ1 F ; Ψ   Ψ ∪ {l  p}  Ψ2 ⇒ Ψ1 discharge weaken m≥n s-width  {l1  p1 , . . . , lm  pm } ⇒ {l1  p1 , . . . , ln  pn } D p  ⇒ p s-depth  Ψ ∪ {l  p} ⇒ Ψ ∪ {l  p } Fig. 3. Lc : Rules Assume we already have two individual statements, depicted in the first column of Fig. 4, The first statement is an increment-by-one operation. If x > 0 before the statement, then after its completion x > 0 still holds. The second statement is if x < 10 goto l. It has one entry, but two exits. The entries and exits are associated with the appropriate assertions that are shown in the figure. The goal is to combine these two statements to form a property of the two-statement block. Notice that the block is effectively a repeat-until loop: it repeats incrementing x until x reaches 10. For this loop, our goal is to prove that if x > 0 before entering the block, then x ≥ 10 after the completion of the block. Figure 4 also presents the steps to derive the goal from the assumptions using Lc ’s composition rules. In step 1, we use a rule called combine in Fig. 3. When combining two fragment sets, F1 and F2 , the combine rule makes the union of the entries of F1 and F2 the entries of the combined fragment; the same goes for the exits. For the example in Fig. 4, since both statements have only one entry, we have two entries after the combine rule. Since the first statement has one exit, and the second statement has two exits, we have three exits after the combine rule. After combining fragments, there may be some label that is both an entry and an exit. For example, the label l after the step 1 in Fig. 4 is both an entry and an exit. Furthermore, the entry and the exit for l carry the same assertion: x > 0. In such a case, the discharge rule in Fig. 3 can eliminate the label l as an exit. Formally, the discharge rule states that if some l  p appears on both the left and the right of the , then it can be removed from the left; Remember exits are on the left, so this rule removes an exit. The label l1 is also both an entry and an A Compositional Logic for Control Flow Assumptions Step 1: combine. Step 2: discharge. Step 3: weaken. Remove exits l1 and l l x>0 x>0 l1 x := x + 1 l1 x>0 l x>0 x>0 l1 x:=x+1 l1 x>0 if x< 10 goto l Goal Remove entry l1 l x>0 l x>0 x:=x+1 if 87 x< 10 goto l x:=x+1 if x< 10 goto l if x<10 goto l l2 x ≥ 10 l x>0 l l1 l2 x>0 x ≥ 10 x>0 l2 x ≥ 10 l2 x ≥ 10 Fig. 4. An example to illustrate Lc ’s composition rules exit, and the entry and the exit carry the same assertion. The discharge rule can remove l1 as an exit as well. Therefore, the step 2 in Fig. 4 applies the discharge rule twice to remove both l and l1 as exits. After this step, only one exit is left. In the last step, we remove l1 as an entry using the weaken rule. The weaken rule uses a relation between two sets of label continuations:  Ψ1 ⇒ Ψ2 , which is read as Ψ1 is a stronger set of label continuations than Ψ2 . The rule s-width in Fig. 3 states that a set of label continuations is stronger than its subset. Therefore,  {l1 (x > 0), l(x > 0)} ⇒ {l(x > 0)} is derivable. Using this result and the weaken rule, the step 3 in Fig. 4 removes the label l1 as an entry. After these steps, we have one entry and one exit left for the repeat-until loop, and we have proved the desired property for the loop. One natural question to ask is which labels the logic should keep as entries. The example eliminates l1 as an entry, while l remains. The reason is that the final goal tells what should be entries. In other scenarios, we may want to keep l1 as an entry; for example, in cases when other fragments need to jump to l1 . This is possible in unstructured control flow even though l1 points to the middle of a loop. In general, the logic Lc itself does not decide which entries to keep and needs extra information. The example in Fig. 4 has used almost all composition rules, except for the s-depth rule. The s-depth rule states that a label continuation with a weaker precondition is stronger than a continuation with a stronger precondition. The rule is contravariant over the preconditions. An example of using this rule and the weaken rule is to derive F ; Ψ   {l  p ∧ q} from F ; Ψ   {l  p}. Deriving Hoare-logic rules. The composition rules in Lc can derive all Hoarelogic rules for common control-flow structures. We next show the derivation of the while rule. Assume that a while loop is implemented by the sequence in Equation (1) on page 81, which will be abbreviated by “while b do s”. As we have mentioned, a Hoare triple {p}s{q} corresponds to {l : (s) : l } ; {l  q}  {l  p} in Lc . With this correspondence, the derivation of the rule is: 88 G. Tan and A.W. Appel (1) {l1 : (s) : l2 } ; {l2  p}  {l1  p ∧ b} (2) goto {l : (while b do s) : l } ; {l  p, l1  p ∧ b, l2  p, l  p ∧ ¬b}  {l  p, l1  p ∧ b, l2  p} combine discharge {l : (while b do s) : l } ; {l  p ∧ ¬b}  {l  p, l1  p ∧ b, l2  p} weaken {l : (while b do s) : l } ; {l  p ∧ ¬b}  {l  p} where (1) = {l : (if ¬b goto l ) : l1 } ; {l  p ∧ ¬b, l1  p ∧ b}  {l  p}1 . and (2) = {l2 : (goto l) : l } ; {l  p}  {l2  p} In the same spirit, Lc can derive rules for many other control-flow structures, including sequential composition, repeat-until loops, if-then-else statements. More examples are in the thesis [16–chapter 2]. Semantics of Lc 3 In this section, we develop a semantics for Lc . We will show that a semantics based on pure continuations does not work. We adopt a semantics based on continuations together with approximations of counting computation steps. 3.1 Operational Semantics for the Language First, we present an operational semantics for the imperative language in Fig. 1.  The operational semantics assumes an interpretation of the primitive symbols in OPSym and RSym in the following way: Val is a nonempty domain; for each op in OPSym, its semantics, op, is a function in (Val ar (op) → Val); for each re in RSym, re is a relation ⊂ Val ar (re) , where ar (op) is the arity of the operator. A machine state is a triple, (pc, π, m): a program counter pc, which is an address; an instruction memory π, which maps addresses to primitive statements or to an illegal statement; a data memory m, which maps variables to values. Figure 5 lists the relevant semantic domains. Before presenting the operational semantics, we introduce some notation. For a state σ, the notation control(σ), i of(σ), and m of(σ) projects σ into its program counter, instruction memory, and data memory, respectively. For a mapping m, the notation m[x → v] denotes a new mapping that maps x to v and leaves other slots unchanged. The operational semantics for the language is presented in Fig. 6 as a step relation σ →θ σ  that executes the statement pointed by the program counter. The operational semantics is conventional, except that it is parametrized over a label map θ ∈ LMap, which maps abstract labels to concrete addresses. When the next statement to execute is goto l, the control is changed to θ(l). 1 The judgment is derived from the if rule and the weaken rule, assuming that D p ∧ ¬¬b ⇒ p ∧ b. A Compositional Logic for Control Flow 89 Name Domain Construction values, v Val is a nonempty domain addresses, n Addr = N instr. memories, π IM = Addr → PrimStmt ∪ {illegal} data memories, m DM = Var → Val states, σ Σ = Addr × IM × DM label maps, θ LMap = Label → Addr where N is the domain of natural numbers. Fig. 5. Semantic domains (pc, π, m) →θ σ where if π(pc) = then σ = x := e (pc + 1, π, m[x → V [ e]] m]) goto l (θ(l), π, m) (θ(l), π, m) if B [ b]] m = tt if b goto l (pc + 1, π, m) otherwise where V : Exp → DM → Val , and B : BExp → DM → {tt, ff}. Their definitions are V [ x]] m  m [ x]] V [ op(e1 , . . . , ear (op) )]] m  op(V [ e1 ] m, . . . , V [ ear (op) ] m). B [ true]] m  tt B [ b1 ∨ b2 ] m  tt if B [ b1 ] m = tt or B [ b2 ] m = tt ff otherwise tt if B [ b]] m = ff ff otherwise tt if V [ e1 ] m, . . . , V [ ear (re) ] m ∈ re B [ re(e1 , . . . , ear (re) )]] m  ff otherwise B [ ¬b]] m  Fig. 6. Operational semantics of the language in Fig. 1 In the operational semantics, if the current statement in a state σ is an illegal statement, then σ has no next state to step to; such a state is called a stuck state. If a state σ will not reach a stuck state within k steps, it is safe for k steps: safe state(σ, k)  ∀σ  ∈ Σ.∀j < k. σ →jθ σ  ⇒ ∃σ  . σ  →θ σ  , where →jθ denotes j steps being taken. 3.2 Semantics of Lc The semantics of Lc is centered on an interpretation of the judgment F ; Ψ   Ψ . We have discussed an informal interpretation: for the set of fragments F , if Ψ  is true, then Ψ is true; a label-continuation set Ψ being true means it is safe to continue from any label in Ψ , provided that the associated assertion is true. However, this interpretation is too naive, since it cannot justify the discharge rule. When both Ψ  and Ψ in the discharge rule are empty sets, the rule becomes 90 G. Tan and A.W. Appel F ; {l  p}  {l  p} F ; ∅  {l  p} According to the informal interpretation, the above rule is like stating “from l  p ⇒ l  p, derive l  p”, which is clearly unsound. The problem is not that Lc is intrinsically unsound, but that the interpretation is too weak to utilize invariants implicitly in Lc . The interpretation that we adopt is a stronger one. The basic idea is based on a notion of label continuations being approximately true. The judgment F ; Ψ   Ψ is interpreted as, by assuming the truth of Ψ  at a lower approximation, Ψ is true at a higher approximation. In this inductive interpretation, Ψ  and Ψ are treated differently, and it allows the discharge rule to be justified by induction. Appel and McAllester proposed the indexed model [17], where all predicates are approximated by counting computation steps. Our own work [18] used the indexed model to construct a semantic model for a typed assembly language. Next, we will adopt the idea of approximation by counting computation steps from the indexed model to develop a semantics for Lc . Label continuations being approximately true. We first introduce a semantic function, A : Assertion → DM → {tt, ff}, which gives a meaning to assertions:  tt if ∃d ∈ Val . A [ p[d/x]]] m = tt A [ ∃x.p]] m  ff otherwise. The definition of “A [ p]] m” on other cases of p is the same as the definition of B (in Fig. 6) except every occurrence of B is replaced by A. Next, we present a notion, σ; θ |=k l  p, to mean that a label continuation l  p is k-approximately true in state σ relative to a label map θ: σ; θ |=k l  p  ∀σ  ∈ Σ. σ →∗θ σ  ∧ control(σ  ) = θ(l) ∧ A [ p]] (m of(σ  )) = tt ⇒ safe state(σ  , k) (3) where →∗θ denotes multiple steps being taken. There are several points that need to be clarified about the definition. First, by this definition, l  p being a true label continuation in σ to approximation k means that the state is safe to execute for k steps. In other words, the state will not get stuck within k steps. Second, the definition is relative to a label map θ, which is used to translate the abstract label l to its concrete address. Last, the definition quantifies over all future states σ  that σ can step to (including σ itself). The reason is that if σ; θ |=k l p, provided that p is satisfied, it should be safe to continue from location l, not just now, but also in the future. In other words, if lp is true in the current state, it should also be true in all future states. Therefore, the definition of σ; θ |=k lp has to satisfy the following lemma: Lemma 1. If σ →∗θ σ  , and σ; θ |=k l  p, then σ  ; θ |=k l  p. By quantifying over all future states, the definition of σ; θ |=k l  p satisfies the above lemma. On this aspect, the semantics of σ; θ |=k l  p is similar to the A Compositional Logic for Control Flow 91 Kripke model [19–Ch 2.5] of intuitionistic logic: Knowledge is preserved from current states to future states. The semantics of a single label continuation is then extended to a set of label continuations: σ; θ |=k Ψ  ∀(l  p) ∈ Ψ. σ; θ |=k l  p Loading statements. The predicate loaded(F, π, θ) describes the loading of a fragment set F into an instruction memory π with respect to a label mapping θ: loaded(F, π, θ)  ∀(l : (t) : l ) ∈ F. π(θ(l)) = t ∧ θ(l ) = θ(l) + 1. Note that some θ are not valid with respect to F . For example, if F = {l : (x := 1) : l }, and θ maps l to address 100, then to be consistent θ has to map l to the address 101. This is the reason why the definition requires2 that θ(l ) = θ(l) + 1. Semantics of the judgment F ; Ψ   Ψ . We define a relation, F ; Ψ  |= Ψ , which is the semantic modeling of F ; Ψ   Ψ . F ; Ψ  |= Ψ  ∀σ ∈ Σ, θ ∈LMap. loaded(F, i of(σ), θ) ⇒ ∀k ∈ N. σ; θ |=k Ψ  ⇒ σ; θ |=k+1 Ψ . The definition quantifies over all label maps θ and all states σ such that F is loaded in the state with respect to θ. It derives the truth of Ψ to approximation k + 1, from the truth of Ψ  to approximation k. In other words, if it is safe to continue from any of the labels in Ψ  , provided that the associated assertion is true, for some number k of computation steps, then it is safe to continue from any of the labels in Ψ , provided that the associated assertion is true, for k + 1 computation steps. This inductive definition allows the discharge rule to be proved by induction over k. We have given F ; Ψ  |= Ψ a strong definition. But the question is what about rules other than the discharge rule. Do they support such a strong semantics? The answer is yes for Lc , because of one implicit invariant—for any judgment F ; Ψ   Ψ that is derivable, it takes at least one computation step from labels in Ψ to reach labels in Ψ  . Or, it takes at least one step from entries of F to reach an exit of F . Because of this invariant, although it is safe to continue from exit labels only for k steps, we can still show that it is safe to continue from entry labels for k + 1 steps. Finally, since Lc also contains rules for deriving  Ψ ⇒ Ψ  and D p, we define relations, |= Ψ ⇒ Ψ  and |= p, to model their meanings, respectively. |= Ψ ⇒ Ψ   ∀σ ∈ Σ, θ ∈ LMap, k ∈ N. (σ; θ |=k Ψ ) ⇒ (σ; θ |=k Ψ  ) |= p  ∀m ∈ DM . A [ p]] m = tt 2 There is a simplification. The definition in the thesis [16] also requires that θ does not map exit labels to addresses occupied by F ; otherwise, the exit label would not be a “true” exit label. 92 G. Tan and A.W. Appel Soundness and completeness. Based on the semantics we have developed, we next present soundness and completeness theorems for Lc . Due to space limit, we only informally discuss related concepts and cannot present detailed proofs; they can be found in the thesis [16]. As a start, since Lc is parametrized by a deduction system D, which derives formulas in the assertion language, it is necessary to assume properties of D before proving properties of Lc : If D p ⇒ |= p, for any p, then D is sound ; if |= p ⇒ D p, for any p, then D is complete. Next, we present the soundness and completeness theorems of Lc . Theorem 1. (Soundness) Assume D is sound. If F ; Ψ  Ψ  , then F ; Ψ |= Ψ  . The proof is by induction over the derivation of F ; Ψ  Ψ  . The most interesting case is the proof of the discharge rule, which is proved by induction over the number of future computation steps k. Theorem 2. (Completeness.)  Assume D is complete and Assertion is expressive relative to . Assume Assertion is negatively testable by the statement language. Assume (F, Ψ  , Ψ ) is normal. If F ; Ψ  |= Ψ , then F ; Ψ   Ψ . We informally explain the meanings of expressiveness, Assertion being negatively testable, and (F, Ψ  , Ψ ) being normal below; their precise definitions are in the thesis [16]. As pointed out by Cook [20], a program logic can fail to be complete, if the assertion language is not powerful enough to express invariants for loops in a program. Therefore, the completeness theorem assumes that the assertion language is expressive. Also, the theorem assumes that the assertion language is negatively testable by the statement language. It means that for any assertion p, there is a sequence of statements that terminates when p is false and diverges when p is true. The triple (F, Ψ  , Ψ ) being normal means that any label is defined in F at most once; it includes also other sanity requirements on Ψ  and Ψ . 4 Implementation in FPCC This work is a part of the Foundational Proof-Carrying Code (FPCC) project [4] at Princeton. FPCC verifies memory safety of machine code from the smallest possible set of axioms—machine semantics plus logic. The safety proof is developed in two stages. First, we design a type system at the machine-code level. Machine code is type checked in the type system and thus a typing derivation is a safety witness. In the second stage, we prove the soundness theorem for the type system: If machine code type checks, it is memory safe. This proof is developed with respect to machine semantics plus logic, and is machine checked. The typing derivation composed with the soundness proof is the safety proof of the machine code. The major research problem of the FPCC project is to prove the soundness of our type system—a low-level typed assembly language (LTAL [21]). LTAL can check the memory-safety of SPARC machine code that is generated from our ML compiler. A Compositional Logic for Control Flow 93 When proving the soundness of LTAL, we found it is easier to have an intermediate calculus to aid the proving process, because having a simple soundness proof was not LTAL’s design goal. We first prove the intermediate calculus is sound from logic plus machine semantics. Then we prove LTAL is sound based on the lemmas provided by the intermediate calculus. The intermediate calculus in the FPCC project is Lc , together with a type theory [22] as the assertion language. By encoding on top of SPARC machine the semantics of Lc , which we have presented, we have proved that Lc is sound with machine-checked proofs in Twelf. Then, we prove that LTAL is sound from the lemmas provided by Lc . The first author’s thesis [16–chapter 3] covers the step from Lc to LTAL in great detail. Here we only discuss a point about control-flow structures in machine code. The simple language in Section 2 on which we presented Lc lacks indirect jumps, pc-relative jumps, and procedure calls. Our implementation on SPARC handles pc-relative jumps, and handles indirect jumps using first-class continuation types in the assertion language. These features will not affect the soundness result of Lc , as our implementation has shown. However, we have not investigated the impact of indirect jumps to the completeness result, which is unnecessary for the FPCC project. We have not modeled procedure call-andreturn—since our compiler uses continuation-passing style, continuation calls and continuation-passing suffice. Procedure calls, if needed, could be handled by following the work of Benton [13]. 5 Conclusion and Future Work Previous program logics for goto statements are too weak to modularly reason about program fragments with multiple entries and multiple exits. We have presented Lc , which needs only local information to reason about a program fragment and compose program fragments in an elegant way. Lc is not only useful for reasoning about unstructured control flow in machine languages, but also useful for deriving rules for common control-flow structures. We have also presented for Lc a semantics, based on which the soundness and completeness theorems are formally proved. We have implemented Lc on top of SPARC machine language. The implementation has been embedded into the Foundational Proof-Carrying Code Project to produce memory-safety proofs for machine-language programs. One possible future extension is to combine this work with modules, to produce a module system with simple composition rules, and with a semantics based on counting computation steps. References 1. Hoare, C.A.R.: An axiomatic basis for computer programming. Communications of the Association for Computing Machinery 12 (1969) 578–580 2. Necula, G.: Proof-carrying code. In: 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, New York, ACM Press (1997) 106–119 94 G. Tan and A.W. Appel 3. Morrisett, G., Walker, D., Crary, K., Glew, N.: From System F to typed assembly language. ACM Trans. on Programming Languages and Systems 21 (1999) 527– 568 4. Appel, A.W.: Foundational proof-carrying code. In: Symposium on Logic in Computer Science (LICS ’01), IEEE (2001) 247–258 5. Clint, M., Hoare, C.A.R.: Program proving: Jumps and functions. Acta Informatica (1972) 214–224 6. Kowaltowski, T.: Axiomatic approach to side effects and general jumps. Acta Informatica 7 (1977) 357–360 7. Arbib, M., Alagic, S.: Proof rules for gotos. Acta Informatica 11 (1979) 139–148 8. de Bruin, A.: Goto statements: Semantics and deduction systems. Acta Informatica 15 (1981) 385–424 9. O’Donnell, M.J.: A critique of the foundations of hoare style programming logics. Communications of the Association for Computing Machinery 25 (1982) 927–935 10. Floyd, R.W.: Assigning meanings to programs. In: Proceedings of Symposia in Applied Mathematics, Providence, Rhode Island (1967) 19–32 11. Cardelli, L.: Program fragments, linking, and modularization. In: 24th ACM Symposium on Principles of Programming Languages. (1997) 266–277 12. Glew, N., Morrisett, G.: Type-safe linking and modular assembly language. In: 26th ACM Symposium on Principles of Programming Languages. (1999) 250–261 13. Benton, N.: A typed, compositional logic for a stack-based abstract machine. In: 3rd Asian Symposium on Programming Languages and Systems. (2005) 14. Ni, Z., Shao, Z.: Certified assembly programming with embedded code pointers. In: 33rd ACM Symposium on Principles of Programming Languages. (2006) To appear. 15. Saabas, A., Uustalu, T.: A compositional natural semantics and Hoare logic for lowlevel languages. In: Proceedings of the Second Workshop on Structured Operational Semantics (SOS’05). (2005) 16. Tan, G.: A Compositional Logic for Control Flow and its Application in Foundational Proof-Carrying Code. PhD thesis, Princeton University (2005) 17. Appel, A.W., McAllester, D.: An indexed model of recursive types for foundational proof-carrying code. ACM Trans. on Programming Languages and Systems 23 (2001) 657–683 18. Tan, G., Appel, A.W., Swadi, K.N., Wu, D.: Construction of a semantic model for a typed assembly language. In: Fifth International Conference on Verification, Model Checking and Abstract Interpretation (VMCAI 04). (2004) 30–43 19. Sørensen, M.H., Urzyczyn, P.: Lectures on the Curry-Howard isomorphism. Available as DIKU Rapport 98/14 (1998) 20. Cook, S.A.: Soundness and completeness of an axiom system for program verification. SIAM Journal on Computing 7 (1978) 70–90 21. Chen, J., Wu, D., Appel, A.W., Fang, H.: A provably sound TAL for back-end optimization. In: ACM Conference on Programming Language Design and Implementation. (2003) 208–219 22. Swadi, K.N.: Typed Machine Language. PhD thesis, Princeton University (2003) Detecting Non-cyclicity by Abstract Compilation into Boolean Functions Stefano Rossignoli and Fausto Spoto Dipartimento di Informatica, Università di Verona, Strada Le Grazie, 15, 37134 Verona, Italy stefano.rossignoli@students.univr.it, fausto.spoto@univr.it Abstract. Programming languages such as C, C++ and Java bind variables to dynamically-allocated data-structures held in memory. This lets programs build cyclical data at run-time, which complicates termination analysis and garbage collection. It is hence desirable to spot those variables which are only bound to non-cyclical data at run-time. We solve this problem by using abstract interpretation to define the abstract domain NC representing those variables. We relate NC through a Galois insertion to the concrete domain of program states. Hence NC is not redundant. We define a correct abstract denotational semantics over NC, which uses preliminary sharing information between variables to get more precise results. We apply it to a simple example of analysis. We use a Boolean representation for the abstract denotations over NC, which leads to an efficient implementation in terms of binary decision diagrams and to the elegant and efficient use of abstract compilation. 1 Introduction Programming languages such as C, C++ and Java allocate dynamic data-structures on the heap. These data-structures might contain cycles, which hinder termination analysis and complicate garbage collection. Consider the classes in Figure 1, in the syntax of Section 4. We observe here that with introduces local variables, and variable out holds the return value of a method. These classes implement a list of subscriptions to a service, such as cable television. Each subscription refers to a person. Some subscriptions come from abroad, and have a higher monthly cost. The method foreign over a list of subscriptions builds a new list, containing only the foreign ones. If, in a call such as l1 :=l2 .foreign(), we know that l2 is bound to a noncyclical data-structure (from now on, if l2 is a non-cyclical variable), then 1. techniques for termination analysis, similar to those for logic programs [1], based on decreasing norms, can prove that the call terminates; non-cyclicity plays here the role of the occur-check in logic programming; 2. a (single pass) reference counting garbage collector can be applied to l1 and l2 when they go out of scope, since they do not lead to cycles. This is faster than a (two passes) mark and sweep collector and has smaller pause times. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 95–110, 2006. c Springer-Verlag Berlin Heidelberg 2006  96 S. Rossignoli and F. Spoto class Object {} class Person extends Object { int age; boolean sex; } class Subs extends Object { Person subscriber; int numOfChannels; Subs next; int monthlyCost() { out := numOfChannels / 2 } // in euros ForeignSubs foreign() with temp:Subs { temp = this.next; if (temp = null) then {} else out := temp.foreign() } } class ForeignSubs extends Subs { int monthlyCost() { out := numOfChannels * 2 } // more expensive ForeignSubs foreign() with temp:Subs, sub:Person { sub := this.subscriber; out := new ForeignSubs; // program point * temp := this.next; if (temp = null) then {} else out.next := temp.foreign(); out.subscriber := sub; out.numOfChannels := this.numOfChannels } } Fig. 1. Our running example: a list of subscriptions to a service Hence detecting the non-cyclical variables helps program verification and optimise the run-time support. One might derive non-cyclicity by abstracting some shape analyses [10] and some alias or sharing analyses which build a graphrepresentation of the run-time heap of the system [9]. In this paper, we follow a different approach. We use abstract interpretation [5] to define a domain NC for non-cyclicity, more abstract than graphs. NC is the same property we want to observe i.e., the set of non-cyclical variables. Because of the simplicity of NC, – a Galois insertion relates NC to the concrete domain of program states. Hence NC is not redundant [5]; – ours is a denotational and hence completely relational analysis which denotes each method with its interpretation i.e., its abstract input/output behaviour; – we represent these behaviours through Boolean functions, which can be efficiently implemented through binary decision diagrams [4]; – we use abstract compilation [6] into Boolean functions, an elegant and efficient implementation of abstract interpretation. Our analysis is possibly less precise than abstractions of graph-based analyses (as some shape, alias or sharing analyses), which keep explicit information on the names of the fields of the objects in the heap. This information is abstracted away in NC. But those analyses do not enjoy all the properties stated above for NC, are much more concrete than NC and have never been actually abstracted to non-cyclicity analysis. The precision of NC can be significantly improved by using some preliminary sharing information between program variables [8]. The paper is organised as follows. Section 2 presents our analysis. Section 3 reports the preliminaries. Section 4 gives syntax and semantics of our simple object-oriented language. Section 5 defines the domain NC. Section 6 gives a Detecting Non-cyclicity by Abstract Compilation into Boolean Functions 97 denotational semantics over NC. Section 7 represents denotations over NC as Boolean functions. Section 8 concludes. Proofs are in [7]. 2 Overview of the Analysis We show here informally how abstract compilation over Boolean functions works for our non-cyclicity analysis. This will be formalised in subsequent sections. Consider the foreign method of class Subs in Figure 1. We want to prove that its result is always non-cyclical. Our analysis starts by compiling all methods into Boolean functions, by following the compilation rules in Figure 7. The result, for the method Subs.foreign, is in Figure 2. Each program variable v is split into its input and output version v̌ and v̂. For instance, the compilation of s = (temp:=this.next) in Figure 1 is the formula φ1 in Figure 2, stating that if this is non-cyclical before the execution of s then temp is non-cyclical after ˇ → temp) ˆ the execution of s (i.e., this and that the non-cyclicity of this and ˇ ˆ ∧ (out ˇ → out)). ˆ out is not affected by s (i.e., (this → this) The then and else branches of the conditional are compiled into φ2 and φ3 , respectively, and the conditional itself into φ2 ∨ φ3 . Formula φ2 states that temp is null and hence ˇ non-cyclical at the beginning of the then branch (i.e., temp), which moreover ˇ ˆ ∧ (out ˇ → out ˆ )∧ does not change the cyclicity of any variable (i.e., (this → this) ˇ → temp)). ˆ (temp Formula φ3 is rather complex since it is the compilation of a method call. We will discuss it in subsequent sections. Here, just note that it refers to unknown formulas I(Subs.foreign) and I(ForeignSubs.foreign), the current interpretation of those methods. They express what we already know about the abstract behaviour of those methods. Variable res holds the value of the method call expression. The operation ◦ on Boolean functions corresponds to sequential composition of denotations and is defined in Section 7. ForeignSubs foreign() with temp:Subs { ˇ ˇ ˆ ˆ ˇ ˆ φ1 [(  this → temp) ∧ (this → this) ∧ (out → out )] ◦ ˇ → this)∧( ˆ ˇ ∧ (this ˇ → out ˆ ) ∧ (temp ˇ → temp)]∨ ˆ [temp out      φ2 I(Subs.foreign)∨ ˇ res) ˇ out ˆ ) ◦ (out→ ˆ ∧(out→ I(ForeignSubs.foreign)    ˇ → this)∧( ˆ ˆ ∧(this ˇ → temp) ˆ ◦ (res ˇ → out) temp  ˆ ◦ ˇ → this) (temp φ3 } Fig. 2. The Boolean compilation of the Subs.foreign method in Figure 1 After abstract compilation, our analysis builds a bottom interpretation I which uses to evaluate the body (formula) of each method. This yields a new interpretation. This process is iterated until the fixpoint, which is the result of the analysis. It expresses a completely relational input/output behaviour for the methods [5, 3]. In our case, the method in Figure 2 is interpreted by the formula ˆ i.e., its return value is always non-cyclical (Example 16). out 98 3 S. Rossignoli and F. Spoto Preliminaries A total (partial) function f is denoted by → (→). The domain (codomain) of f is dom(f ) (rng(f )). We denote by [v1 → t1 , . . . , vn → tn ] the function f where dom(f ) = {v1 , . . . , vn } and f (vi ) = ti for i = 1, . . . , n. Its update is f [w1 → d1 , . . . , wm → dm ], where the domain may be enlarged. By f |s (f |−s ) we denote the restriction of f to s ⊆ dom(f ) (to dom(f ) \ s). If f (x) = x then x is a fixpoint of f . The composition f ◦ g of functions f and g is such that (f ◦ g)(x) = g(f (x)) so that we often denote it as gf . The components of a pair are separated by . A definition of S such as S = a  b, with a and b meta-variables, silently defines the pair selectors s.a and s.b for s ∈ S. A poset S  ≤ is a set S with a reflexive, transitive and antisymmetric relation ≤. Given s ∈ S, we define ↓s = {s ∈ S | s ≤ s}. If C  ≤ and A  are posets (the concrete and the abstract domain), a Galois connection [5] is a pair of α(c2 ) if c1 ≤ c2 , monotonic maps α : C → A and γ : A → C (i.e., α(c1 ) similarly for γ) such that γα is extensive (c ≤ γα(c) for any c ∈ C ) and αγ is reductive (αγ(a) ≤ a for any a ∈ A). It is a Galois insertion if αγ is the identity map i.e., if A contains no redundancy i.e., if γ is one-to-one. An abstract operator fˆ : An → A is correct w.r.t. a concrete f : C n → C if αf γ fˆ. 4 Our Simple Object-Oriented Language Syntax. Variables are typed and bound to values. We do not consider primitive types since they cannot be used to build cycles. Definition 1. A program has a set of variables V (including res, out, this) and a finite set of classes (or types) K ordered by a subclass relation ≤. A type environment specifies the types of a finite set of variables. It is any element of the set TypEnv = {τ : V → K | dom(τ ) is finite}. Later on, τ will stand for a type environment. Type environments describe both the variables in scope at a given program point and the fields of a given class κ ∈ K, written as F (κ). Example 1. In Figure 1 we have K = {Object, Person, Subs, ForeignSubs}, where Object is the top of the hierarchy and ForeignSubs ≤ Subs. Since we are not interested in primitive types, we have F (Object) = F (Person) = [] and F (Subs) = F (ForeignSubs) = [subscriber → Person, next → Subs]. Expressions and commands are normalised versions of Java’s. Only distinct variables can be actual parameters of a method call; leftvalues are only a variable or the field of a variable; conditionals only check equality or nullness of variables; loops are implemented through recursion. We can relax these simplifying assumptions without affecting subsequent results. It is significant that we allow downwards casts, since reachability (Definition 6) depends from their presence. Detecting Non-cyclicity by Abstract Compilation into Boolean Functions 99 Definition 2. Expressions1 and commands are exp ::= null κ | new κ | v | v .f | (κ)v | v .m(v1 , . . . , vn ) com ::= v := exp | v.f := exp | {com; · · · ;com} | if v = w then com else com | if v = null then com else com where κ ∈ K and v, w, v1 , . . . , vn ∈ V are distinct. Each method κ.m is defined in class κ as κ0 m(w1 : κ1 , . . . , wn : κn ) with wn+1 : κn+1 , . . . , wn+m : κn+m is com, where w1 , . . . , wn , wn+1 , . . . , wn+m ∈ V are distinct, not in {out, res, this} and have static type κ1 , . . . , κn , κn+1 , . . . , κn+m∈ K, respectively. Variables w1 , . . . , wn are the formal parameters of the method, wn+1 , . . . , wn+m are its local variables. The method also uses a variable out of type κ0 to store its return value. We let body (κ.m) = com, returnType(κ.m) = κ0 , input(κ.m) = [this → κ, w1 → κ1 , . . . , wn → κn ], output(κ.m) = [out → κ0 ], locals(κ.m) = [wn+1 → κn+1 , . . . , wn+m → wn+m ] and scope(κ.m) = input (κ.m) ∪ output(κ.m) ∪ locals(κ.m). Example 2. For ForeignSubs.foreign in Figure 1 (just foreign below) we have input (foreign) = [this → ForeignSubs], output(foreign) = [out →  ForeignSubs], locals(foreign) = [sub → Person, temp → Subs]. Our language is strongly typed i.e., expressions exp have a static (compile-time) type type τ (exp) in τ , consistent with their run-time values (see [7]). Semantics. We use a denotational semantics, hence compositional, in the style of [11]. However, we use more complex states, which include a heap. By using a denotational semantics, our states contain only a single frame, rather than an activation stack of frames. A method call is hence resolved by plugging the interpretation of the method (Definition 5) in its calling context. This is standard in denotational semantics and has been used for years in logic programming [3]. A frame binds variables (identifiers) to locations or null. A memory binds such locations to objects, which contain a class tag and the frame for their fields. Definition 3. Let Loc be an infinite set of locations. We define frames, objects and memories as Frame τ = {φ | φ ∈ dom(τ ) → Loc ∪ {null}}, Obj = {κ  φ | κ ∈ K, φ ∈ Frame F (κ) } and Memory = {µ ∈ Loc → Obj | dom(µ) is finite}. A new object of class κ is new (κ) = κ  φ, with φ(f ) = null for each f ∈ F (κ). Example 3. Figure 3 shows a frame φ and a memory µ. Different occurrences of the same location are linked. For instance, variable this is bound to location l1 and µ(l1 ) is a ForeignSubs object. Objects are shown as boxes in µ with a class tag and a local frame mapping fields to locations or null. The state in Figure 3 might be the current state of an interpreter at program point ∗ in Figure 1. 1 The null constant is decorated with the class κ induced by its context, as in v := null κ, where κ is the type of v. This way we do not need a distinguished type for null. We assume that the compiler provides κ. 100 S. Rossignoli and F. Spoto φ temp null this out sub l1 l3 l4 locations l1 µ l2 ForeignSubscription subscriber next l3 Subscription subscriber ForeignSubscription next subscriber l4 l5 Person Person next no fields l4 l2 l5 l1 null null no fields objects Fig. 3. A state (frame φ and memory µ) for the type environment τ = [temp → Subs, this → ForeignSubs, out → ForeignSubs, sub → Person] The states of the computation are made of a frame φ and a memory µ. We assume type correctness φ  µ : τ , which bans dangling pointers and insists that variables and fields are bound to null or to locations containing objects allowed by their type. This constraint is sensible for strongly-typed languages which use a conservative garbage collector. For a formal definition, see [7]. Definition 4. Let τ be the type environment at a given program point p. The set of possible states at p is Στ = {φ  µ | φ ∈ Frame τ , µ ∈ Memory, φ  µ : τ }. Example 4. In Figure 3, the pair φ  µ does not contain dangling pointers and respects the typing of variables and fields. Hence φ  µ : τ and φ  µ ∈ Στ . Denotations are the input/output semantics of a piece of code. Interpretations provide a denotation for each method. Definition 5. An interpretation I maps methods to denotations such that, for each method κ.m, we have I(κ.m) : Σinput (κ.m) → Σoutput(κ.m) . We describe here informally the denotations for our language. Their formal definition is in [7]. Expressions have side-effects and return a value. Hence their denotations are partial maps from an initial to a final state containing a distinguished variable res holding the expression’s value: EτI [[ ]] : exp → (Στ → Στ +exp ), where res ∈ dom(τ ), τ + exp = τ [res → type τ (exp)] and I is an interpretation. Namely, given an input state φ  µ, the denotation of null κ binds res to null in φ. That of new κ binds res to a new location holding a new object of class κ. That of v copies v into res. That of v.f accesses the object o = µ(φ(v)) bound to v (provided φ(v) = null) and copies the field f of o (i.e., o.φ(f)) into res. That of (κ)v copies v into res, but only if the cast is satisfied. That of method call uses the dynamic class of the receiver to fetch the denotation of the method from I. It plugs it in the calling context, by building a starting state σ † for the method, whose formal parameters (including this) are bound to the actual parameters. The denotation of a command is a partial map from an initial to a final state: CτI [[ ]] : com → (Στ → Στ ) with res ∈ dom(τ ). The denotation of v:=exp uses Detecting Non-cyclicity by Abstract Compilation into Boolean Functions 101 that of exp to get a state where res holds exp’s value. Then it copies res into v and removes res. Similarly for v.f :=exp, but res is copied into the field f of the object bound to v, if any. The denotation of the conditionals checks their guard and then uses the denotation of then or that of else. The denotation of a sequence of commands is the functional composition of their denotations. We use CτI [[ ]] to define a transformer on interpretations. It evaluates the methods’ bodies in I, expanding the input state with local variables bound to null . It restricts the final state to out, to respect Definition 5. This corresponds to the immediate consequence operator used in logic programming [3]. Its least fixpoint is the denotational semantics of the program (see [7]). 5 Non-cyclicity A variable v is non-cyclical if there is no cycle in its reachable locations. Namely, if a location l is bound, in a memory µ, to an object o = µ(l), then the locations bound to o’s fields (i.e., rng(o.φ) ∩ Loc) are reachable from l. Reachability is the transitive closure of this relation, passing through one or more objects. Definition 6. Let µ ∈ Memory and l ∈ dom(µ). The set of locations reachable in µ from l is L(µ)(l) = ∪{Li (µ)(l) | i ≥ 0}, where Li : Memory → Loc → ℘(Loc), for i ≥ 0, is such that L0 (µ)(l) = rng(µ(l).φ) ∩ Loc and Li+1 (µ)(l) = ∪{rng(µ(j).φ) ∩ Loc | j ∈ Li (µ)(l)}. We let all fields rng(µ(j).φ) of the object µ(j) to be reachable, since we allow checked casts (Section 4). Then all fields of an object can be accessed. Example 5. In Figure 3 we have L0 (µ)(l1 ) = {l2 , l4 }, L1 (µ)(l1 ) = {l1 , l5 } and L2 (µ)(l1 ) = {l2 , l4 } = L0 (µ)(l1 ); L0 (µ)(l3 ) = L0 (µ)(l4 ) = ∅. Hence L(µ)(l1 ) = {l1 , l2 , l4 , l5 } and L(µ)(l3 ) = L(µ)(l4 ) = ∅. Hence l1 is reachable from l1 , while no location is reachable from l3 nor from l4 . A location l is not necessarily reachable from itself, as Example 5 shows for l3 and l4 . That is only true if there is a path starting at l and passing back through l. This is the case of l1 in Figure 3, so that we say that l1 is cyclical there. But l is cyclical also if a location l can be reached from l and l is cyclical. Definition 7. Let µ ∈ Memory . A location l ∈ dom(µ) is cyclical in µ if there is l ∈ L(µ)(l) such that l ∈ L(µ)(l ). A variable v ∈ dom(τ ) is cyclical in φ  µ ∈ Στ if φ(v) = null and φ(v) is cyclical in µ. It is non-cyclical otherwise. We can hence define the set of states where all variables in a set nc are noncyclical. Nothing is known about the others. Definition 8. Let nc ⊆ dom(τ ). The set of states in Στ where the variables nc are non-cyclical is γτ (nc) = {φ  µ ∈ Στ | nc are non-cyclical in φ  µ}. Example 6. Let φµ be as in Figure 3. By Example 5, φ  µ ∈ γτ ({temp, out, sub}), φ  µ ∈ γτ ({temp, out}), φ  µ ∈ γτ ({sub}), φ  µ ∈ γτ (∅) but φ  µ ∈ γτ ({this, out}), since variable this is cyclical in φ  µ. 102 S. Rossignoli and F. Spoto Definition 8 might suggest that an abstract domain for definite non-cyclicity is ℘(dom(τ )). However, this would result in a Galois connection rather than a Galois insertion with ℘(Στ ). For instance, in Figure 3 we have τ (sub) = Person. It is hence redundant to ask sub to be non-cyclical: no cycle can be reached from a Person. As a consequence, γτ ({sub}) = γτ (∅) i.e., γτ is not one-to-one and, hence, is not the concretisation map of a Galois insertion (Section 3). To get a Galois insertion, we must consider only those variables whose static type is cyclical so that they can be both cyclical and non-cyclical at run-time. To define cyclicity for types (classes), we first need a notion of reachability for types (classes). From a class κ, we can reach the classes C = rng(F (κ)) of its fields F (κ) and those of every subclass κ ≤ κ. Moreover, we can reach every subclass of C (because of the cast, see Section 4). We can also reach the classes of the fields of C, and so on recursively. Definition 9. The classes reachable from κ ∈ K are C(κ) = ∪{C i (κ) | i ≥ 0} where C 0 (κ) = ↓{rng(F (κ )) | κ ≤ κ} and C i+1 (κ) = ↓{rng(F (κ )) | κ ∈ C i (κ)}. Example 7. In Figure 1, for every i ≥ 0 we have that C i (Person) = ∅ and C i (Object) = C i (Subs) = C i (ForeignSubs) = {Person, Subs, ForeignSubs}. Thus C(Person) = ∅ and C(Object) = C(Subs) = C(ForeignSubs) = {Person, Subs, ForeignSubs}. We can now define cyclicity for classes. Definition 10. Let κ ∈ K. Class κ is cyclical if and only if there exists κ ∈ C(κ) such that κ ∈ C(κ ). Given τ , the set of variables of non-cyclical static type is NC τ = {v ∈ dom(τ ) | τ (v) is non-cyclical}. Example 8. From Example 7 we conclude that Object, Subs and ForeignSubs in Figure 1 are cyclical, while Person is non-cyclical. Object is recognised as cyclical since there is Subs ∈ C(Object) and Subs ∈ C(Subs). Similarly, ForeignSubs is recognised as cyclical by taking κ = ForeignSubs. We can now define the abstract domain for definite non-cyclicity. Its elements are those subsets of ℘(dom(τ )) which contain all variables of non-cyclical type. Definition 11. The abstract domain for definite non-cyclicity is NCτ = {nc ∈ ℘(dom(τ )) | NC τ ⊆ nc}, ordered by inverse set-inclusion (dom(τ ) is the least element, NC τ is the top element). Its concretisation map is γτ in Definition 8, restricted to NCτ . Example 9. From Example 8, in Figure 3 NC τ = {sub} and {temp, out, sub} ∈ NCτ , {this, out, sub} ∈ NCτ , {sub} ∈ NCτ but ∅ ∈ NCτ and {this} ∈ NCτ . We state now the main result of this section i.e., our abstract domain for noncyclicity actually induces a Galois insertion with the concrete domain. Proposition 1. The map γτ of Definition 8 is the concretisation map of a Galois insertion between NCτ and ℘(Στ ). The induced abstraction map, for any S ⊆ Στ , is ατ (S) = {v ∈ dom(τ ) | v is non-cyclical in every φ  µ ∈ S}. Detecting Non-cyclicity by Abstract Compilation into Boolean Functions 6 103 Abstract Semantics for Non-cyclicity We define here an abstract denotational semantics over NC. It builds a chain of non-cyclicity interpretations until a fixpoint is reached [3]. Definition 12. A non-cyclicity interpretation I maps methods to abstract denotations, such that I(κ.m) : NCinput (κ.m) → NCoutput(κ.m) for each method κ.m. Example 10. A non-ciclicity interpretation for Figure 1 might be such that I(Subs.foreign)(nc) = I(ForeignSubs.foreign)(nc) = {out} for any nc. That is, the output of such methods is non-cyclical, whatever is the input. This is sensible since they return a new, non-cyclical ForeignSubs, if they do not diverge. Interpretation I is the bottom interpretation which, for every method κ.m, maps every input to the least element of NCoutput(κ.m) . Proposition 2. Figures 4 and 5 report abstract denotations for our language. They are correct w.r.t. the concrete denotations described in Section 4. Note that Figures 4 and 5 use preliminary information on the pairs of variables which share i.e., are bound to overlapping data-structures [8]. We have used abstract interpretation to derive Figures 4 and 5. Namely, for every concrete operation (denotation, since we deal with denotational semantics) op and abstract input nc, those figures provide a correct approximation nc  of α(op(γ(nc))). This means that the variables in nc  are definitely non-cyclical in every concrete state σ  obtained by applying op to any concrete state σ where the variables in nc are definitely non-cyclical. Let us discuss those figures. For expressions, we have N CE Iτ [[exp]] : NCτ → NCτ +exp , where res ∈ dom(τ ) and I is a non-cyclicity interpretation. Variable res, in the final state, refers to the value of exp (Section 4). Hence, on NC, variable res belongs to nc  if the analysis is able to prove that the value of exp is definitely non-cyclical. N CE Iτ [[null κ]](nc) = N CE Iτ [[new κ]](nc) = nc ∪ {res } N CE Iτ [[v]](nc) = N CE Iτ [[(κ)v]](nc) = N CE Iτ [[v.f]](nc) = N CE Iτ [[v.m(v1 , . . . vn )]](nc) = nc ∪ {res} nc nc ∪ {res} nc if v ∈ nc otherwise if v ∈ nc or F (τ (v))(f) is non-cyclical otherwise (nc \ SE ) ∪ {res} nc \ SE if nc  = {out } if nc  = ∅ where nc † = (nc ∩ {v, v1 , . . . , vn })[v → this, v1 → w1 , . . . , vn → wn ] nc  = ∩{I(κ.m)(nc † ) | κ.m can be called here} and SE = {w ∈ dom(τ ) | w shares with some {v, v1 , . . . , vn }} \ NC τ . Fig. 4. The abstract denotational semantics of the expressions 104 S. Rossignoli and F. Spoto N CC Iτ [[v:=exp]](nc) = N CE Iτ [[exp]](nc) ◦ setVar vτ +exp ( (nc \ {res }) ∪ {v} if res ∈ nc where setVar vτ  (nc) = nc \ {v} otherwise N CC Iτ [[v.f :=exp]](nc) = N CE Iτ [[exp]](nc) ◦ setField v.f τ +exp 8  > (v))(f) is non-cyclical nc \ {res} if F (τ < where setField v.f  (nc) = or (res ∈ nc and res does not share with v) τ > : (nc \ SE ) \ {res} otherwise where SE ={w ∈ dom(τ  ) | w shares with v} \ NC τ  8 N CC Iτ [[com 1 ]](nc ∪ {v, w}) ∩ N CC Iτ [[com 2 ]](nc) > > > < hh if v = w then com ii if v ∈ nc or w ∈ nc 1 (nc) = N CC Iτ I I else com 2 > N CC > τ [[com 1 ]](nc) ∩ N CC τ [[com 2 ]](nc) > : otherwise hh if v = null then com ii 1 (nc) = N CC Iτ [[com 1 ]](nc ∪ {v}) ∩ N CC Iτ [[com 2 ]](nc) N CC Iτ else com 2 N CC Iτ [[{}]] = λnc ∈ NCτ .nc, N CC Iτ [[{com 1 ; . . . ; com p }]] = N CC Iτ [[com 1 ]] ◦ · · · ◦ N CC Iτ [[com p ]]. Fig. 5. The abstract denotational semantics of the commands The concrete denotations of the expressions null κ and new κ yield a final state σ  which coincides with σ except on res, which holds null or a new object, respectively. In both cases, res is non-cyclical and we define nc  = nc ∪ {res}. The concrete denotation of v computes σ  by copying the value of v into res, while the other variables are not affected. Hence res belongs to nc  if and only if v ∈ nc. The same is correct for the cast (κ)v, whose concrete denotation coincides with that of v when it is defined (i.e., when the cast is legal). The concrete denotation of v.f loads in res the value of the field f of the object bound to v, if any. The other variables are not changed. Then if v is non-cyclical in σ also res is non-cyclical in σ  , since you can reach res from a field of v. If, instead, we do not know if v is non-cyclical in σ (i.e., if v ∈ nc), we can still guarantee that res is non-cyclical in σ  if we know that the class F (τ (v))(f) of the field f is non-cyclical. In conclusion, we let nc  = nc ∪ {res} if v ∈ nc or F (τ (v))(f) is non-cyclical, while we conservatively assume nc  = nc otherwise. The concrete denotation of the method call v.m(v1 , . . . , vn ) first computes an input state σ † for the method. It consists of the input state σ restricted to v, v1 , . . . , vn and where v is renamed into this and each vi into the formal parameter wi . We mimic this behaviour on the abstract domain and define nc † accordingly. We then apply σ † to the interpretation for the method. In the concrete semantics, late-binding is resolved by using the run-time class κ of v. In the abstract semantics, we only know that κ ≤ τ (v). Hence we conservatively select all possible targets κ.m of the method call. The final state of the call is chosen to be consistent with all such targets, by using set intersection: nc  = ∩{I(κ.m)(nc † ) | κ.m can be called here}. If out is non-cyclical in nc  then the value of the method call expression is non-cyclical also, and we should define Detecting Non-cyclicity by Abstract Compilation into Boolean Functions 105 nc  = nc ∪ {res}, otherwise we should let nc  = nc. But you can see in Figure 4 that we remove from nc a set of variables SE which share. with at least one of v, v1 , . . . , vn , and have cyclical type. This is because a method can modify, as a side effect, everything which is reachable from its formal parameters, and hence introduce cyclicity. Without sharing information, we could only conservatively assume that every pair of variables shares, when their types allow them to share. This definition can also be made more precise by including shadow copies of v, v1 , . . . , vn in the method body. They hold the initial values of such parameters and are never modified, so that at the end of the method call they provide explicit information on the cyclicity of v, v1 , . . . , vn . For commands, we have N CC Iτ [[com]] : NCτ → NCτ where I is a non-ciclicity interpretation and res ∈ dom(τ ). The concrete denotation of v:=exp first evaluates exp and then composes its denotation with a map setVar v which copies res into v. The initial value of v is lost. This is mimicked on NC by an abstract map setVar v . If variable res is non-cyclical, this map makes v non-cyclical also. Otherwise it removes v from the set of non-cyclical variables, since we have no information on its cyclicity. The concrete denotation of v.f :=exp uses, similarly, a map setField which updates σ by writing the value of exp, held in res, in the field f of v, thus yielding σ  . Hence, if res is non-cyclical and does not share with v, this operation can only remove cyclicity and we can safely assume nc  = nc \ {res} (variable res is removed after the assignment). Similarly when the field f has non-cyclical type, so that we cannot reach a cycle from f. The non-sharing requirement is necessary since otherwise v might be made cyclical by closing a loop, like in v.f := v. If none of these cases applies, we might make cyclical the variables SE which share with v and have cyclical type (often v ∈ SE ). The concrete denotation of the conditionals can be conservatively approximated by the greatest lower bound (i.e., set intersection, see Definition 11) of their two branches. We improve this approximation by taking the guard into account. If v = w holds then v and w are aliases and hence both cyclical or both non-cyclical. If v = null holds then v contains null and is hence noncyclical. The concrete denotation of the sequential composition of commands is approximated by the functional composition of their abstract denotations. Definition 13. The transformer on interpretations transforms a non-cyclicity interpretation I in a new non-cyclicity interpretation I  such that I  (κ.m)(nc) = N CC Iscope(κ.m) [[body(κ.m)]](nc ∪{out, wn+1 , . . . , wn+m })∩{out}. The denotational non-ciclicity semantics of a program is the least fixpoint of this transformer. In Definition 13, local variables {wn+1 , . . . , wn+m } are assumed to be noncyclical at the beginning of the methods, since they are bound to null there. Proposition 3. The non-cyclicity semantics of Definition 13 is correct w.r.t. the concrete semantics of Section 4. 106 S. Rossignoli and F. Spoto The least fixpoint of Definition 13 is computed by repeated application of this transformer from the bottom interpretation [5, 3]. In our case, the bottom interpretation is given in Example 10. Example 11 shows that one application of the transformer reaches the fixpoint. {out , sub, temp} sub := this.subscriber {out , sub, temp} out := new ForeignSubs {out , sub, temp} temp := this.next {out , sub} if (temp = null) then {} else out .next := temp.foreigners() {out , sub} out .subscriber := sub {out , sub} Example 11. Let us prove that the abstract interpretation I of Example 10 is a fixpoint of the transformer of Definition 13: N CC Iinput (κ.m) [[body(κ.m)]](nc ∪ {out, temp}) = {out} for any nc ∈ Fig. 6. The analysis of method NCinput (κ.m) , where κ.m ranges over ForeignSubs.foreign Subs.foreign and ForeignSubs.foreign (methods monthlyCost are irrelevant since they work on primitive types). We can only have nc = ∅ or nc = {this}. By monotonicity of the denotations in Figures 4 and 5, we can just prove this result for nc = ∅ > {this} i.e., assuming that we do not know anything about the non-ciclicity of the variable this. Consider Subs.foreign. Let τ = scope(Subs.foreign) = [this → Subs, temp → Subs, out → ForeignSubs]. Then N CC Iτ [[temp :=this.next]]({out, temp}) is equal temp I to setVar temp τ +this.next (N CE τ [[this.next]]({out, temp})), which is setVar τ +this.next ({out, temp}) = {out}. Hence     temp := this.next; out, N CC Iτ if temp = null then {} else out := temp.foreign() temp  if temp= null then {}  (N CC Iτ [[temp := this.next]]({out, temp})) = N CC Iτ else out :=temp.foreign() = N CC Iτ [[if temp = null then {} else out := temp.foreign()]]({out }) = N CC Iτ [[{}]]({temp, out }) ∩ N CC Iτ [[out := temp.foreign()]]({out }) I = {temp, out } ∩ setVar out τ +temp.foreign() (N CE τ [[temp.foreign()]]({out })). In this call, nc † = ∅ and τ (temp) = Subs. Then nc  = I(Subs.foreign)(∅) ∩ I(ForeignSubs.foreign)(∅) = {out} ∩ {out} = {out}. A sharing analysis, such as that in [8], proves that temp shares with only this and temp itself here (out holds null ). So SE = {this, temp} and N CE Iτ [[temp.foreign()]]({out }) = ({out} \ SE ) ∪ {res} = {out, res}   and the equation above is {temp, out } ∩ setVar out τ +temp.foreign() ({out , res}) = {temp, out} ∩ {out} = {out}. The result for ForeignSubs.foreign is shown in Figure 6. We report the approximation before and after each statement. We do not consider the statement out .numOfChannels := this.numOfChannels in Figure 1 since it deals with primitive types, not relevant for us. The result is {out, sub} i.e., out and sub are non-cyclical. Its restriction to out (Definition 13) is {out}, as expected. The assignment out.next:= temp.foreign() maintains out’s non-cyclicity since the value of the right-hand side is non-cyclical (as we have computed above) and does Detecting Non-cyclicity by Abstract Compilation into Boolean Functions 107 not share with out , as it can be proved for instance through the analysis in [8]. So the first case of the definition of setField in Figure 5 applies. The assignment out.subscriber := sub maintains out’s non-cyclicity since field subscriber has non-cyclical type Person. The first case of the definition of setField applies. Example 11 shows that our analysis is able to prove non-cyclicity in a non-trivial situation. Example 12 shows instead that, correctly, non-cyclicity is lost when a variable is bound to a cyclic data-structure. Example 12. Let only v be in scope. At the end of the piece of code v := new Subs; v .next :=v , variable v is cyclical. Our analysis reflects this since, if we start it for instance from ∅ (no variable is definitely non-cyclical) then the approximation {v } is computed after the first statement, and ∅ after the second. Here, we used the rule for v.f :=exp in Figure 5 with SE = {res, v }, since res i.e., the value of v , shares with v. We conclude with Example 13, which shows a false alarm i.e., a situation where our analysis is too conservative and is not able to prove non-cyclicity. Example 13. Let only v be in scope. At the end of the piece of code v := new Subs; v .next := new Subs; v .next :=v .next, variable v is non-cyclical. If we start for instance our analysis from ∅, we compute {v} after the first and second statement. For the last one we apply the rule for v.f :=exp in Figure 5 with SE = {res, v}, since res i.e., the value of v .next, shares with v. The result, as in Example 12, is ∅ i.e., the analysis is too conservative to prove that v is non-cyclical. 7 Compilation into Boolean Functions Figures 4 and 5 report maps over NC i.e., over sets of variables. They can be represented through Boolean functions (or formulas) i.e., functions over Boolean variables, which can then be efficiently implemented through binary decision diagrams [4]. The idea is that two Boolean variables v̌ and v̂ represent the noncyclicity of program variable v in the input, respectively, output, of a denotation. Definition 14. A non-cylicity denotation d : NCτ → NCτ  is represented by a Boolean function φ over the variables {v̌ | v ∈ dom(τ )} ∪ {v̂ | v ∈ dom(τ  )} iff for every nc ∈ NCτ we have d(nc) = nc  ⇔ {v | (∧{v̌ | v ∈ nc} ∧ φ) |= v̂} = nc  , where |= is logical consequence or entailment. Example 14. Let τ be as in Figure 3. The denotation d : NCτ → NCτ such that d({this, sub}) =  {out, sub} and d(nc) = {sub} for any nc = {this, sub}, is ˇ ∧ ¬out ˇ → out ˆ ˇ ∧ ¬temp ˇ ∧ sub) ˆ ∧ sub. represented by φ = (this Composition d1 ◦ d2 of non-cyclicity denotations d1 and d2 is represented by the Boolean function b1 ◦ b2 = ∃v (b1 [v̂ → v ] ∧ b2 [v̌ → v ]), where b1 is the Boolean representation of d1 , b2 is the representation of d2 , b1 [v̂ → v ] renames the output variables of b1 into new temporary, primed variables, b2 [v̌ → v ] renames the 108 S. Rossignoli and F. Spoto BE τ [[null κ]] = BE τ [[new κ]] = res ˆ ∧ U (dom(τ )) ˆ ∧ U (dom(τ )) BE τ [[v]] = BE τ [[(κ)v]] = (v̌ → res) ( res ˆ ∧ U (dom(τ )) if F (τ (v))(f) is non-cyclical BE τ [[v.f]] = (v̌ → res) ˆ ∧ U (dom(τ )) otherwise ˆ ∧ v̌1 → ŵ1 ∧ . . . ∧ v̌n → ŵn )◦ BE τ [[v.m(v1 , . . . , vn )]] = U (dom(τ ) \ SE ) ∧ [(v̌ → this ˇ → res)] ◦ ∨{I(κ.m) | κ.m can be called here} ◦ (out ˆ BC τ [[v:=exp]] = BE τ [[exp]] ◦ setVar vτ +exp with setVar vτ  = (res ˇ → v̂) ∧ U (dom(τ  ) \ {res , v}) BC τ [[v.f :=exp]] = BE τ [[exp]] ◦ setField v.f τ +exp (  U ((dom(τ )\SE )\{res }) if res and v share and F (τ (v))(f) is cycl. with setField v.f τ = (∧{(res ˇ ∧ v̌) → v̂ | v ∈ SE }) ∧ U ((dom(τ  ) \ SE ) \ {res}) else BC τ [[if v = w then com 1 else com 2 ]] = ((v̌ ↔ w̌) ∧ BC τ [[com 1 ]]) ∨ BC τ [[com 2 ]] BC τ [[if v = null then com 1 else com 2 ]] = (v̌ ∧ BC τ [[com 1 ]]) ∨ BC τ [[com 2 ]] BC τ [[{}]] = U (dom(τ )), BC τ [[{com 1 ; . . . ; com p }]] = BC τ [[com 1 ]] ◦ · · · ◦ BC τ [[com p ]]. Fig. 7. Compilation rules from our language into Boolean functions input variables of b2 into the same temporaries and ∃v removes such temporaries through Schröder elimination [2]: ∃x φ = φ[x → true] ∨ φ[x → false]. Figure 7 reports Boolean functions for the non-cyclicity denotations of Figures 4 and 5. The frame condition U (vars) = ∧{v̌ → v̂ | v ∈ vars} states that variables vars do not change. Interpretations I map now methods to Boolean functions representing their non-cyclicity denotation. You can see Figure 7 as a set of compilation rules from the language of Section 4 into Boolean functions. Let us consider Figure 7. The representation for new κ and null κ is a Boolean function stating that res is non-cyclical in the output. All other variables are unchanged. That of v and (κ)v propagates the non-cyclicity of v into that of res. The representation of v.f depends on the non-cyclicity of F (τ (v))(f), which can be checked at analysis-time. The representation for method call is the composition of a Boolean function, matching the actual parameters with the formal ones, with a Boolean function that fetches the interpretations of the methods which might be called (I is fed later), and a Boolean function which renames out into res. A frame condition expresses that no variable is changed by the call, except those in SE (Figure 4). Assignments use, as in Figure 5, maps (now formulas) setVar and setField . The latter checks, at analysis-time, if res shares with v and field f has cyclical type. The representation of the conditionals is the disjunction of their two branches, but we improve the information available on the then branch, exactly as in Figure 5. Namely, if v = w holds then v and w are both cyclical or both non-cyclical. If v = null holds then v is non-cyclical. Example 15. Figure 2 shows the application of the compilation rules in Figure 7 to the abstract compilation of the method ForeignSubs.foreign in Figure 1. Detecting Non-cyclicity by Abstract Compilation into Boolean Functions 109 Interpretations over Boolean functions are updated by a transformer which is the compilation into Boolean functions of that of Definition 13. Definition 15. The transformer on non-cyclicity interpretations transforms a non-cyclicity interpretation I  into I  (both represented with Boolean functions) ˆ ∧ ŵn+1 ∧ . . . ∧ ŵn+m ) ◦ φκ.m [I → such that I  (κ.m) = (U (dom(input (κ.m))) ∧ out ˇ → out), ˆ where φκ.m [I → I  ] is the compiled body (formula) of method I  ] ◦ (out κ.m with I  plugged instead of I. The denotational non-cyclicity semantics over Boolean functions of a program is the least fixpoint of this transformer. Proposition 4. The denotational non-cyclicity semantics over Boolean functions represents (Definition 14) the non-cyclicity semantics of Definition 13. Example 16. The bottom interpretation of Example 10 is represented as I (κ.m) = ˆ for every κ.m. Formula φSubs.foreign is in Figure 2. Then I  (Subs.foreign) = out ˇ → this) ˆ ∧ out ˇ → out ˆ ). To compute ˆ ∧ temp) ˆ ◦ (φ1 ◦ (φ2 ∨ φ3 [I → I  ]))) ◦ (out ((this ˆ instead of each occurrence of I. We have φ3 [I → I  ] = φ3 [I → I  ], plug out ˆ ◦ out ˇ → ˇ → this) ˆ ◦ (out ˇ → res)) ˇ → out ˆ )) ◦ ((res ˆ ∧ (this (((temp ˆ ∧ (out ˇ → out) ˆ ˇ → temp)). ˆ this) ∧ (temp We show an example of (associative) composition:  ˆ ◦ (out ˇ → res) ˆ = (out  ∧ (out  → res))[out ˆ → out ˆ = ∃out  (out  ∧ (out  → res))    true] ∨ (out ∧ (out → res))[out ˆ → false] = res. ˆ Continuing the calculation ˇ → this) ˆ ∧ ˇ → out)) ˆ ˆ ) ∧ (this we have φ3 [I → I  ] = (res ˆ ∧ (out ◦ ((res ˇ → out ˇ → temp)) ˆ ˆ It can also be checked that φ1 ◦ (φ2 ∨ φ3 [I → I  ]) = (temp = out. ˇ → this) ˆ ∧ temp ˆ Hence I  (Subs.foreign) = ((this ˆ ∧ out) ˆ ◦ out ˆ ◦ (out ˇ → out. ˆ = out ˆ . The same can be computed for ForeignSubs.foreign. In conclusion out) I  = I  is the least fixpoint of the transformer of Definition 15 (compare with Example 11). 8 Conclusion We have applied abstract compilation into Boolean formulas to define a static analysis which detects non-cyclicity of program variables. This leads to an elegant and relational formulation of the analysis, and in prospective to an implementation through binary decision diagrams [4]. This is considered very important for building efficient implementations of static analyses [2]. References 1. K. R. Apt and D. Pedreschi. Reasoning about Termination of Pure Prolog Programs. Information and Computation, 106(1):109–157, September 1993. 2. T. Armstrong, K. Marriott, P. Schachte, and H. Søndergaard. Two Classes of Boolean Functions for Dependency Analysis. Science of Computer Programming, 31(1):3–45, 1998. 3. A. Bossi, M. Gabbrielli, G. Levi, and M. Martelli. The s-Semantics Approach: Theory and Applications. Journal of Logic Programming, 19/20:149–197, 1994. 110 S. Rossignoli and F. Spoto 4. R. E. Bryant. Graph-Based Algorithms for Boolean Function Manipulation. IEEE Transactions on Computers, 35(8):677–691, 1986. 5. P. Cousot and R. Cousot. Abstract Interpretation and Applications to Logic Programs. Journal of Logic Programming, 13(2 & 3):103–179, 1992. 6. M. Hermenegildo, W. Warren, and S. K. Debray. Global Flow Analysis as a Practical Compilation Tool. Journal of Logic Programming, 13(2 & 3):349–366, 1992. 7. S. Rossignoli and F. Spoto. Detecting Non-Cyclicity by Abstract Compilation into Boolean Functions (extended version with proofs). Available at http://www.sci.univr.it/∼spoto/papers.html, 2005. 8. S. Secci and F. Spoto. Pair-Sharing Analysis of Object-Oriented Programs. In C. Hankin and I. Siveroni, editors, Static Analysis Symposium (SAS), volume 3672 of Lecture Notes in Computer Science, pages 320–335, London, UK, 2005. 9. Bjarne Steensgaard. Points-to Analysis in Almost Linear Time. In Principles of Programming Languages (POPL), pages 32–41, St. Petersburg Beach, Florida, USA, January 1996. 10. R. Wilhelm, T. W. Reps, and S. Sagiv. Shape Analysis and Applications. In Y. N. Srikant and P. Shankar, editors, The Compiler Design Handbook, pages 175–218. CRC Press, 2002. 11. G. Winskel. The Formal Semantics of Programming Languages. The MIT Press, 1993. Efficient Strongly Relational Polyhedral Analysis Sriram Sankaranarayanan1,3, Michael A. Colón2 , Henny Sipma3 , and Zohar Manna3 1 3 NEC Laboratories America, Princeton, NJ srirams@nec-labs.com 2 Center for High Assurance Computer Systems, Naval Research Laboratory colon@itd.nrl.navy.mil Computer Science Department, Stanford University, Stanford, CA 94305-9045 (sipma, zm)@theory.stanford.edu Abstract. Polyhedral analysis infers invariant linear equalities and inequalities of imperative programs. However, the exponential complexity of polyhedral operations such as image computation and convex hull limits the applicability of polyhedral analysis. Weakly relational domains such as intervals and octagons address the scalability issue by considering polyhedra whose constraints are drawn from a restricted, user-specified class. On the other hand, these domains rely solely on candidate expressions provided by the user. Therefore, they often fail to produce strong invariants. We propose a polynomial time approach to strongly relational analysis. We provide efficient implementations of join and post condition operations, achieving a trade off between performance and accuracy. We have implemented a strongly relational polyhedral analyzer for a subset of the C language. Initial experimental results on benchmark examples are encouraging. 1 Introduction Polyhedral analysis seeks to discover invariant linear equality and inequality relationships among the variables of an imperative program. The computed invariants are used to establish safety properties such as freedom from buffer overflows. The standard approach to polyhedral analysis is through a fixed point iteration in the domain of convex polyhedra [9]. Complexity considerations, however, restrict its application to small systems. Libraries such as NewPolka [13] and PPL [2] have made strides towards addressing some of these tractability issues, but still the approach remains impractical for large systems. At the heart of this intractability lies the need to repeatedly convert between constraint and generator representations of polyhedra. Efficient analysis techniques work on restricted forms of polyhedra wherein such a conversion can  This research was supported in part by NSF grants CCR-01-21403, CCR-02-20134, CCR-02-09237, CNS-0411363 and CCF-0430102, by ARO grant DAAD 19-01-1-0723 and by NAVY/ONR contract N00014-03-1-0939 and the Office of Naval Research. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 111–125, 2006. c Springer-Verlag Berlin Heidelberg 2006  112 S. Sankaranarayanan et al. be avoided. Weakly relational domains such as octagons [17], intervals [7], octahedra [6] and the TCM domain [18], avoid these conversions by considering polyhedra whose constraints are fixed a priori. The abstract domain of Simon et al. [19] considers polyhedra with at most two variables per constraint. Using these syntactic restrictions, the analysis can be carried out efficiently. However, the main drawback of such syntactic restrictions is the inability of the analysis to infer invariants that require expressions of an arbitrary form. Thus, in many cases, such domains may fail to prove the property of interest. In this paper, we provide an efficient strongly relational polyhedral domain by drawing on ideas from both weak and strong relational analysis. We present alternatives to the join and post condition operations. In particular, we provide a new join algorithm, called inversion join, that works in polynomial time in the size of the input polyhedra, as opposed to the exponential space polyhedral join. We make use of linear programming to implement an efficient join and post condition operators, along with efficient inclusion checks and widening operators. On the other hand, our domain operations are weaker than the conventional polyhedral domain operations, potentially yielding weaker invariants. Using a prototype implementation of our techniques, we have analyzed several sorting and string handling routines for buffer overflows. Our initial results are promising; our analysis performs better than the standard approaches while computing invariants that are sufficiently strong in practice. Outline. Section 2 discusses the preliminary notions of polyhedra, transition systems and invariants. Section 3 discusses algorithms for domain operations needed for polyhedral analysis. Section 4 discusses the implementation and the results obtained on benchmark examples. 2 Preliminaries We recall some standard results on polyhedra, followed by a brief description of system models and abstract interpretation. Throughout the paper, let R represent the set of reals and R+ = R ∪ {±∞} represent the extended real numbers. Definition 1 (Linear Assertions). A linear expression e is of the form a1 x1 + · · · + an xn + b, wherein each ai ∈ R and b ∈ R+ . The expression is said to be homogeneous if b = 0. A linear constraint is of the form a1 x1 + · · · + an xn + b  0, with  ∈ {≥, =}. A linear assertion is a finite conjunction of linear inequalities. Note that the linear inequality e + ∞ ≥ 0 represents the assertion true, whereas the inequality e − ∞ ≥ 0 represents false. Since each equality e = 0 can be represented as a conjunction of two inequalities, an assertion can be written in matrix form as Ax + b ≥ 0, where A is a m × n matrix, while x and b are n and m-dimensional vectors, respectively. The set of points in Rn satisfying a linear assertion is called a polyhedron. The representation of a polyhedron by a linear assertion is known as its constraint representation. Alternatively, a polyhedron may be represented explicitly Efficient Strongly Relational Polyhedral Analysis 113 by a finite set of vertices and rays, known as the generator representation. Each representation may be exponentially larger than the other. For instance, the n dimensional hypercube is represented by 2n constraints and 2n generators. Efficient libraries of conversion algorithms such as the new PolKa [12] and the Parma Polyhedral Library (PPL) [2] have made significant improvements to the size of the polyhedra for which the conversion is possible. Nevertheless, this conversion still remains intractable for large polyhedra involving 100s of variables and constraints. A Template Constraint Matrix (TCM) T is a finite set of homogeneous linear expressions over x. Given an assertion ϕ, its expressions induce a TCM T which we shall denote as Ineqs (ϕ). If ϕ is represented as Ax +b ≥ 0 then Ineqs (ϕ) : Ax. Linear Programming. We briefly describe the theory of linear programming. Details may be found in standard textbooks [5]. Definition 2 (Linear Programming). A canonical instance of the linear programming (LP) problem is of the form minimize e subject to ϕ , for assertion ϕ and a linear expression e, called the objective function. The goal is to determine the solution of ϕ for which e is minimal. A LP problem can have one of three results: (1) an optimal solution; (2) −∞, i.e, e is unbounded from below in ϕ; (3) +∞, i.e, ϕ has no solutions. It is well-known that an optimal solution, if it exists, is realized at a vertex of the polyhedron. Therefore, the optimal solution can be found by evaluating e at each of the vertices. Enumerating all the vertices is very inefficient because the number of generators is worst-case exponential in the number of constraints. The popular simplex algorithm (due to Danzig [10]) employs a sophisticated hill climbing strategy that converges on an optimal vertex without necessarily enumerating all vertices. In theory, the technique is worst-case exponential. The simplex method is efficient over most problems. Interior point methods such as Karmarkar’s algorithm and other techniques based on ellipsoidal approximations are guaranteed to solve linear programs in polynomial time. Using an open-source implementation of simplex such as glpk [15], massive LP instances involving tens of thousands (104 and beyond) of variables and constraints can be solved efficiently. Programs and Invariants We assume programs over real valued variables without any function calls. The program is represented by a linear transition system also known as a control flow graph. Definition 3 (Linear Transition Systems). A linear transition system (LTS) Π : L, T , 0 , Θ over a set of variables V consists of 114 S. Sankaranarayanan et al. – L: a set of locations (cutpoints); – T : a set of transitions (edges), where each transition τ : i , j , ρτ  consists of a pre-location i , a post-location j , and a transition relation ρτ , represented as a linear assertion over V ∪ V  , where V denotes the values of the variables in the current state, and V  their values in the next state; – 0 ∈ L: the initial location; – Θ: a linear assertion over V specifying the initial condition. A run of a LTS is a sequence m0 , s0  , m1 , s1  , . . ., with mi ∈ L and si a valuation of V , also called a state, such that – Initiation: m0 = 0 , and s0 |= Θ – Consecution: for all i ≥ 0 there exists a transition τ : j , k , ρτ  such that mi = j , mi+1 = k , and si , si+1  |= ρτ . A state s is reachable at location  if , s appears in some run. A given linear assertion ψ is a linear invariant of a linear transition system (LTS) at a location  iff it is satisfied by every state reachable at . An assertion map associates each location of a LTS to a linear assertion. An assertion map η is invariant if η() is an invariant, for each  ∈ L. In order to prove a given assertion map invariant, we use the inductive assertions method due to Floyd (see [16]). Definition 4 (Inductive Assertion Maps). An assertion map η is inductive iff it satisfies the following conditions: Initiation: Θ |= η(0 ), Consecution: For each transition τ : i , j , ρτ , (η(i ) ∧ ρτ ) |= η(j ) . Note that η(j ) refers to η(j )[V |V  ] with variables in V substituted by their corresponding primed variables in V  . It is well known that any inductive assertion map is invariant. However, the converse need not be true. The standard technique for proving an assertion invariant is to find an inductive assertion that strengthens it. Linear Relations Analysis Linear relation analysis seeks an inductive assertion map for the input program, labeling each location with a linear assertion. Analysis techniques are based on the theory of Abstract Interpretation [8] and specialized for linear relations by Cousot and Halbwachs [9]. The technique starts with an initial assertion map, and weakens it iteratively using the post, join and the widening operators. When the iteration converges, the resulting map is guaranteed to be inductive, and hence invariant. Termination is guaranteed by the design of the widening operator. The post condition operator takes an assertion ϕ and a transition τ , and computes the set of states reachable by τ from a state satisfying ϕ. It can be expressed as post (ϕ, τ ) : (∃V0 )(ϕ(V0 ) ∧ ρτ (V0 , V )) Efficient Strongly Relational Polyhedral Analysis 115 Standard polyhedral operations can be used to compute post. However, more efficient strategies for computing post exist when ρτ has a special structure. Given assertions ϕ{1,2} such that ϕ1 |= ϕ2 , the standard widening ϕ1 ∇ϕ2 is an assertion ϕ that contains all the inequalities in ϕ1 that are satisfied by ϕ2 . The details along with key mathematical properties of widening are described in [9, 8], and enhanced versions appear in [12, 4, 1]. As mentioned earlier, the analysis begins with an initial assertion map defined by η0 (0 ) = Θ, and η0 () = false for  = 0 . At each step, the map ηi is updated to map ηi+1 as follows: ⎡ ⎤  (post (ηi (j ), τj ))⎦ , ηi+1 () = ηi () op ⎣ηi () τj ≡j ,,ρ where op is the join ( ) operator for a propagation step, and the widening (∇) operator for a widening step. The overall algorithm requires a predefined iteration strategy. A typical strategy carries out a fixed number of initial propagation steps, followed by widening steps until termination. Linear Assertion Domains Linear relation analysis is performed using a forward propagation wherein polyhedra are used to represent sets of states. Depending on the family of polyhedra considered, such domains are classified as weakly relational or strongly relational. Let T = {e1 , . . . , em }be a TCM. The weakly relational domain induced by T consists of assertions ei ∈T ei + bi ≥ 0 for bi ∈ R+ . TCMs and their induced weakly relational domain are formalized in our earlier work [18]. Given a weakly relational domain defined by a TCM T and a linear transition system Π, we seek an inductive assertion map η such that η() belongs to the domain of T for each location . Many weakly relational domains have been studied: Intervals, octagons and octahedra are classical examples. Example 1 (Weakly Relational Analysis). Let X be the set of system variables. The interval domain is defined by the TCM consisting of expressions TX = {±xi | xi ∈ X}. Thus, any  polyhedron belonging to the domain is an interval expression of the form (xi + ai ≥ 0 ∧ −xi + bi ≥ 0) . The goal of interval analysis is to discover the coefficients ai , bi ∈ R+ representing the bounds for each variable xi at each location of the program [7]. The octagon domain of Miné subsumes the interval domain by considering additional expressions of the form ±xi ± xj such that xi , xj ∈ X [17]. The octahedron domain due to Clarisó and Cortadella considers expressions of the  form i ai xi such that ai ∈ {−1, 0, 1} [6]. It is possible to carry out the analysis in any weakly relational domain efficiently [18]. Theorem 1. Given a TCM T and a linear system Π, all the domain operations for the weakly relational analysis of Π in the domain induced by T can be performed in time polynomial in |T | and |Π|. 116 S. Sankaranarayanan et al. integer x,y where (x = 1 ∧ x ≥ y) 0 : while true do if (x ≥ y) then (x, y) := (x + 2, y + 1) else (x, y) := (x + 2, y + 3) end if end while Fig. 1. An example program Weakly relational domains are appealing since the analysis in these domains is scalable to large systems. On the other hand, the invariants they produce are often imprecise. For instance, even if ei + ai ≥ 0 is invariant for some expression ei in the TCM, its proof may require an inductive strengthening ej + aj ≥ 0, where ej is not in the TCM. A strongly relational analysis does not syntactically restrict the polyhedra considered. The polyhedral domain is not restricted in its choice of invariant expressions, and is potentially more precise than a weakly relational domain. The main drawback, however, is the high complexity of the domain operations. Each domain operation requires conversions from the constraint to the generator representation and back. Popular implementations of strongly relational analysis require worst case exponential space due to repeated representation conversions. Example 2. Consider the system in Figure 1. Interval and octagon domains both discover the invariant ∞ ≥ x ≥ 1 at location 0 . A strongly relational analysis such as polyhedral analysis discovers the invariant x ≥ 1 ∧ 3x − 2y ≥ 1, as does the technique that we present. 3 Domain Operations The theory of Abstract Interpretation provides a framework for the design of program analyses. A sound program analysis can be designed by constructing an abstract domain with the following domain operations: Join (union). Given two assertions ϕ1 , ϕ2 in the domain, we seek an assertion ϕ such that ϕ1 |= ϕ and ϕ2 |= ϕ. In many domains, it is possible to find the strongest possible ϕ satisfying this condition. The operation of computing such an assertion is called the strong join. Post Condition. Given an assertion ϕ, and a transition relation ρτ , we seek an assertion ψ such that ϕ[V ] ∧ ρτ [V, V  ] |= ψ[V  ]. A strong post condition operator computes the strongest possible assertion ψ satisfying this condition. Widening. Widening ensures the termination of the fixed point iteration. Additionally, inclusion tests between assertions are important for detecting the termination of an iteration. Feasibility tests and redundancy elimination are also Efficient Strongly Relational Polyhedral Analysis 117 frequently used to speed up the analysis. We present several different join and post condition operations, each achieving a different trade off between efficiency and precision. Join Given two linear assertions ϕ1 and ϕ2 over a vector x of system variables, we seek a linear assertion ϕ, such that both ϕ1 |= ϕ and ϕ2 |= ϕ. Strong Join. The strong join seeks the strongest assertion ϕ (denoted ϕ1 s ϕ2 ) subsuming both ϕ1 and ϕ2 . In the domain of convex polyhedra, this is known as the polyhedral convex hull and is obtained by computing the generator representations of ϕ1 and ϕ2 . The set of generators of ϕ is the union of those of ϕ1 and ϕ2 . This representation is then converted back into the constraint representation. Due to the repeated representation conversions, the strong join is worst-case exponential space in the size of the input assertions. Example 3. Consider the assertions ϕ1 : x − y ≤ 5 ∧ y + x ≤ 10 ∧ −10 ≤ x ≤ 5 ϕ2 : x − y ≤ 9 ∧ y + x ≤ 5 ∧ −9 ≤ x ≤ 6 Their strong join ϕ1 s ϕ2 , generated by the union of their vertices, is ϕ : 6x + y ≤ 35 ∧ y + 3x + 45 ≥ 0 ∧ x − y ≤ 9 ∧ x + y ≤ 10 ∧ −10 ≤ x ≤ 6 . Weak Join. The weak join operation is inspired by the join used in weakly relational domains. Definition 5 (Weak Join). The weak join of two polyhedra ϕ1 , ϕ2 is computed as follows: 1. Let TCM T = Ineqs (ϕ1 ) ∪ Ineqs (ϕ2 ) be the set of inequality expressions that occur in either of ϕ{1,2} . Recall that each equality in ϕ1 or ϕ2 is represented by two inequalities in T . 2. For each expression ei in T , we compute the values ai and bi using linear programming, as follows: ai = minimize ei subject to ϕ1 bi = minimize ei subject to ϕ2 It follows that ϕ1 |= (ei ≥ ai ) and ϕ2 |= (ei ≥ bi ). 3. Let ci = min(ai , bi ). Therefore, both ϕ1 , ϕ2 |= (ei ≥ min(ai , bi ) ≡ ci ). The weak join ϕ1 w ϕ2 is given by the assertion ei ∈T ei ≥ ci . The advantage of the weak join is its efficiency: it can be computed using LP queries, where both the number of such queries and the size of each individual query is polynomial in the input size. On the other hand, the weak join does not discover any new relations. It is weaker than the strong join, as shown by the argument above (and strictly so, as shown by the following Example). 118 S. Sankaranarayanan et al. Example 4. Consider the assertions ϕ1 , ϕ2 from Example 3 above. The TCM T and the ai , bi values are shown in the table below: # Relation ai (ϕ1 ) bi (ϕ2 ) 1 y − x ≥ −5 −9 T : 2 −y − x ≥ −10 −5 3 x ≥ −10 −9 4 −x ≥ −5 −6 The weak join is given by ϕw : (y − x ≥ −9 ∧ −y − x ≥ −10 ∧ x ≥ −10 ∧ −x ≥ −6) . This result is strictly weaker than the strong join computed in Example 3. Restricted Joins. The weak join shown above is more efficient than the strong join. However, this efficiency comes at the cost of precision. We therefore seek efficient alternatives to strong and weak join. The k-restricted join (denoted k ) improves upon the weak join as follows: 1. Choose a subset of inequalities from ϕ1 , ϕ2 , each of cardinality at most k. Let ψ1 and ψ2 be the assertions formed by the chosen inequalities. In general, ψ1 , ψ2 may contain different sets of inequalities, even different cardinalities. Note that ϕi |= ψi for i = 1, 2. 2. Compute the strong join ψ1 s ψ2 in isolation. Conjoin the results with the weak join ϕ1 w ϕ2 . 3. Repeat step 1 for a different set of choices of ψ{1,2} , while conjoining each such join to the weak join. Since ϕi |= ψi , for i = 1, 2, it follows by the monotonicity of the strong join operation that ϕ1 s ϕ2 |= ψ1 s ψ2 . Thus ϕ1 s ϕ2 |= ϕ1 k ϕ2 for each k ≥ 0. Let ϕ1 , ϕ2 have at most m constraints. The k-restricted join requires vertex 2 enumeration for O((m k ) ) polyhedra with at most k constraints. As such, this join is efficient only if k is a small constant. We shall now provide an efficient O(m2 ) algorithm based on 2 , to improve the weak join. Inversion Join. The inversion join is based on the 2-restricted join. Let T be the TCM and ai , bi be the values computed for the weak join as in Definition 5. Consider pairs of expressions ei , ej ∈ T yielding the assertions ψ1 : ei ≥ ai ∧ ej ≥ aj ψ2 : ei ≥ bi ∧ ej ≥ bj We use the structure of the assertions ψ1 , ψ2 to perform their strong join analytically. The key notion is that of an inversion. Definition 6 (Inversion). Expressions ei , ej ∈ T and corresponding coefficients ai , aj , bi , bj form an inversion iff the following conditions hold: Efficient Strongly Relational Polyhedral Analysis 119 ej ≥ aj ei ≥ ai H e j ≥ bj e i ≥ bi (a) (b) (c) Fig. 2. (a) ai > bi , aj < bj , (b) Weak join is strictly weaker than strong join, (c) ai > bi , aj > bj : Weak join is the same as strong join 1. ai , aj , bi , bj ∈ R, i.e, none of them is ±∞. 2. ei = λej for λ ∈ R, i.e, ei , ej are linearly independent. 3. ai < bi and bj < aj (or vice-versa). Example 5. Consider two “wedges” ψ1 : ei ≥ ai ∧ ej ≥ aj and ψ2 : ei ≥ bi ∧ ej ≥ bj . Depending on the values of ai , aj , bi , bj , two cases arise as depicted in Figures 2(a,b,c). Figures 2(a,b) form an inversion. When this happens, the weak join (a) is strictly weaker than the strong join (b). Figure 2(c) does not form an inversion. The weak and strong joins coincide in this case. Therefore, a strong join of polyhedra that form an inversion gives rise to a half space that is not discovered by the weak join. We now derive this “missing half-space” H analytically. The half space subsumes both ψ1 and ψ2 . A half-space that is a consequence of ψ1 : ei ≥ ai ∧ ej ≥ aj is of the form H : ei + λij ej ≥ ai + λij aj , for some λij ≥ 0. Similarly for ψ2 , we obtain H : ei + λij ej ≥ bi + λij bj . Equating coefficients, yields the equation ai + λij aj = bi + λij bj . The required value of λij is ai − b i λij = . b j − aj Note that requiring λij > 0 yields ai < bi and bj < aj . Therefore, ψ1 , ψ2 contain a non trivial common half-space iff they form an inversion. Definition 7 (Inversion Join). Given ϕ1 , ϕ2 the inversion join ϕ1 computed as follows: inv ϕ2 is 1. Compute the TCM T = Ineqs (ϕ1 ) ∪ Ineqs (ϕ2 ). 2. For each ei ∈ T compute ai , bi as defined in Definition 5 using linear programming. At this point ϕ1 |= ei ≥ ai and ϕ2 |= ei ≥ bi . Let ϕw = ϕ1 w ϕ2 be the weak join. 3. For each pair ei , ej , consider the expression ei + λij ej ≥ ai + λij aj , with λij as defined above. 120 S. Sankaranarayanan et al. y x (b) (c) (a) Fig. 3. Inversion join over two polyhedra (a), (b) and (c) are the newly discovered relations 4. The inversion join is the conjunction of ϕw and all the inversion expressions generated in Step 3. Optionally, simplify the result by removing redundant inequalities. Example 6. Figure 3 shows the result of an inversion join over two input polyhedra ϕ1 , ϕ2 used in Example 3. Example 4 shows the TCM T and the ai , bi values. There are three inversions # Expressions Subsuming Half-Space (a) 1, 3 y + 3x + 45 ≥ 0 (b) 2, 4 −y − 6x + 35 ≥ 0 (c) 1, 2 y − 9x + 65 ≥ 0 The “expressions” column in the table above refers to expressions by their row numbers in the table of Example 4. From Figure 3, note that (c) is redundant. Therefore the result of the join may require redundancy elimination (algorithm provided later in this section). This result is equivalent to the result of the strong join in Example 3. Theorem 2. Let ϕ1 , ϕ2 be two polyhedra. It follows that ϕ1 s ϕ2 |= ϕ1 inv ϕ2 |= ϕ1 w ϕ2 . The inversion join requires as many LP queries as the weak join and additional O(m2 n) arithmetic operations to compute inversions, where m is the number of inequalities in T and n, the dimensionality. Note. The descriptions of the weak and inversion join treat each equality as two inequalities. The resulting join could be made more precise if additionally, the equality join defined by Karr’s analysis [14] is computed and conjoined to the result. This can be achieved in time that is polynomial in the number of equalities. Post Condition The post condition computes the image of an assertion ϕ under a transition relation of the form ξ ∧ x = Ax + b. This is equivalent to the image of Efficient Strongly Relational Polyhedral Analysis 121 ϕ ∧ ξ under the affine transformation x = Ax + b. If the matrix A is invertible, then this image is easily computed by substituting x = A−1 (x − b) [9]. On the other hand, it is frequently the case that A is not invertible. We present three alternatives, the strong, weak and restricted post conditions. Strong Post. The strong post is computed by first enumerating the generators of ϕ ∧ ξ. Each generator is transformed under the operation Ax + b. The resulting polyhedron is generated by these images. Conversion back to the constraint representation completes the computation. Weak Post. Weak post requires a TCM T  labeling the post location of the transition. Alternatively, this TCM may be derived from Ineqs (η( )) where η( ) labels the post-location. Given the existence of such a TCM, we may use the post operation defined for TCMs [18] to compute the weak post. Note. The post condition computation for equalities can be performed separately using the image operation defined for Karr’s analysis. This can be added to the result, thus strengthening the weak post. k-Restricted Post The k-restricted post condition improves upon the weak post by using the monotonicity of the strong post operation (see [8]) similar to the krestricted join algorithm. Therefore, considering a subset of up to k inequalities ψ, we may compute the strong post of ψ and add the result conjunctively to the weak post. The results improve upon the precision of weak post. As is the case for join, it is possible to treat the cases for k = 1, 2 efficiently. Example 7. Consider the polyhedron ϕ : x − y ≥ 0 ∧ x ≤ 0 ∧ x + y + 3 ≤ 0 and the transformation x := x + 3, y := 0. Consider the TCM T = {x − y, x + y, y − x, −x − y}. The weak post of ϕ w.r.t T is computed by finding bounds for each expression. For instance the bound for x − y is discovered by solving: minimize x − y  s.t. ϕ ∧ x = x + 3 ∧ y  = 0 The overall weak post is obtained by solving 4 LPs, one for each element of T , ϕw : 3 ≥ x − y ≥ 1.5 ∧ 3 ≥ x + y ≥ 1.5 . This is strictly weaker than the strong post ϕs : 3 ≥ x ≥ 1.5 ∧ y = 0. The 1-restricted post computes the post condition of each half-space in ϕ separately. This yields the result y = 0 for all the three half-spaces. Conjoining the 1restricted post with the weak post yields the same result as the strong post in this example. Note. The projection operation, an important primitive for interprocedural analysis, can be implemented along the same lines as the post condition operation, yielding the strong, weak and restricted projection operations. Feasibility, Inclusion Check and Redundancy Elimination There exist polynomial time algorithms that are efficient in practice for checking feasibility of a polyhedron and inclusion between two polyhedra. 122 S. Sankaranarayanan et al. Feasibility. The simplex method can be used to check feasibility of a given linear inequality assertion ϕ. In practice, we solve the optimization problem minimize 0 subject to ϕ. An answer of +∞ indicates the infeasibility of ϕ. Inclusion Check. As a primitive, consider the problem of checking whether a given inequality e ≥ 0 is entailed by ϕ, posing the LP: minimize e subject to ϕ. If the optimal solution is a, it follows from the definition of a LP problem that ϕ |= e ≥ a. Thus subsumption holds iff a ≥ 0. In order to decide if ϕ |= Ax + b ≥ 0, we decide if the entailment holds for each half-space Ai x + bi ≥ 0. Redundancy Elimination (Simplification). Each inequality is checked for subsumption by the remaining inequalities using the inclusion check primitive. Widening The standard widening of Cousot and Halbwachs may be implemented efficiently using linear programming. Let ϕ1 , ϕ2 be two polyhedra such that ϕ1 |= ϕ2 . Let us assume that ϕ1 , ϕ2 are both satisfiable. We seek to drop any constraint ei ≥ 0 in ϕ1 that is not a consequence of ϕ2 . This can be achieved by the inclusion test primitive described above. Definition 8 (Standard Widening). The standard widening of two polyhedra ϕ1 |= ϕ2 , denoted ϕ = ϕ1 ∇ϕ2 is computed as follows, 1. Check satisfiability of ϕ1 , ϕ2 . If either one is unsatisfiable, widening reduces to their join. 2. Otherwise, for each ei ∈ Ineqs (ϕ1 ), compute bi = minimize ei subject to ϕ2 . If bi < 0 then drop the constraint ei ≥ 0 from ϕ1 . This operator is identical to the widening defined by Cousot and Halbwachs [9]. The operator may be improved by additionally computing the join of the equalities in both polyhedra. The work of Bagnara et al. [1] presents several approaches to improving the precision of widening operators. 4 Performance We have implemented many of the ideas in this paper in the form of an abstract domain library written in Ocaml. Our library uses GLPK [15] to solve LP queries, and PPL [2] to convert between the constraint and generator representations of polyhedra. Such conversions are used to implement the strong join and post condition. Communication between the different libraries is implemented using Unix pipes. As a result, the communication overhead is significant for small examples. Choosing Domain Operations. We have provided several options for the join and the post condition operations. In practice, one can envision many strategies for choosing among these operations. Our implementation chooses between the strong and the weak versions based on the sizes of the input polyhedra. Strong post condition and joins are used for smaller polyhedra (40 variables+constraints). On the Efficient Strongly Relational Polyhedral Analysis 123 other hand, the inversion join is used for polyhedra with roughly 100s of variables+constraints, while the weak versions are used for larger polyhedra. We observe empirically that the use of strong operations does not improve the result once the widening phase is started. Therefore, we resort to weak join and post condition for the widening phase of the analysis. 4.1 Benchmark Examples We supplied our library to generate invariants for a few benchmark system models drawn from related projects such as FAST [3] and our previous work [18]. Table 1 shows the complexity of each system in terms of number of variables (#vars) along with the performance of our technique of mixed strong, weak and inversion domain operations as compared with the purely strong join/post operations implemented directly in C++ using the PPL library. We compare the running time and memory utilization of both implementations. Results were measured on an Intel Xeon II processor with 1GB RAM. The last column compares the invariants generated. A “+” indicates that our technique discovers strictly stronger invariants whereas a “=” denotes that the invariants are incomparable. Also, for small polyhedra, strong operations frequently outperform weak domain operations in terms of time. However, their memory consumption seems asymptotically exponential. Therefore, weak domain operations yield a drastic performance improvement when the size of the benchmark examples increases beyond the physical memory capacity of the system. Comparing the invariants generated, it is interesting to note that the invariants produced by both techniques are, for the most part, incomparable. While inversion join is weaker than strong join, the non-monotonicity of the widening operation and its dependence on the syntactic representation of the polyhedra cause the two versions to compute different invariants. Analysis of πVC Programs. We applied our abstract domain library to analyze a subset of the C language called πVC, consisting of imperative programs over Table 1. Performance on Benchmark Examples. All times are in seconds and memory utilization in Mbs. Name (#vars) #trans Strong+Weak Purely Strong time mem time mem req-grant(11) 8 3.14 5.7 0.1 4.1 csm(13) 8 6.21 5.9 0.1 4.2 c-pJava(18) 14 11.2 6.0 0.1 4.1 multipool(18) 21 10.0 6.0 2.1 9.2 incdec(32) mesh2x2(32) bigjava(44) mesh3x2(52) 28 32 37 54 39.12 33.8 46.9 122 6.8 6.4 7.2 8.1 8.7 10.4 18.53 66.2 256.2 55.3 > 1h+ > 800+ ± + = = + = =  = + 124 S. Sankaranarayanan et al. integers with function calls. The language features dynamically allocated arrays, records and recursive function calls while excluding pointers. Parameters are passed by value and global variables are disallowed. The language incorporates invariant annotations by the user that are verified by the compiler using a background decision procedure. Our analysis results in sound annotations that aid the verifying compiler in checks for runtime safety such as freedom from overflows, and with optional user supplied assertions, help prove functional correctness. Our analyzer is inter-procedural, using summaries to handle function calls in a context sensitive manner. Our abstraction process models arrays in terms of their allocated sizes while treating their contents as unknowns. Integer operations such as multiplication, division and modulo are modeled conservatively so that soundness is maintained. The presence of recursive function calls requires that termination be ensured by limiting the number of summary instances per function and by widening on the summary preconditions. Table 2 shows the performance on implementations of standard sorting algorithms, string search algorithms and a part of the web2C code for converting Pascal-style writes into C-style printf functions, originally verified by Dor et al. [11]. The columns in Table 2 show the size of each program in lines of code and number of functions. An asterisk (*)√ identifies programs containing recursive functions. We place a check mark ( ) in the “proves property” column if the resulting annotations themselves prove all array accesses and additional user provided assertions. Otherwise, the number of unproven accesses/assertions is indicated. Our analyzer proves a vast majority (≥ 90%) of the assertions valid, without any user interaction. Indirect array accesses such as a[b[i]] are a major reason for the false positives. We are looking into more sophisticated abstractions to handle such accesses. The invariants generated by both the versions are similar for small programs, even though weak domain operations were clearly used during the analysis. The difference in performance is clearer as the size of the program increases. Our interface to the PPL library represents coefficients using long integers. This led to an overflow error while analyzing quicksort. In conclusion, we have designed and implemented efficient domain operations and applied our technique to verify interesting benchmark examples. We hope to extend our analyzer to handle essential features such as pointers and arrays. Table 2. Performance of invariant generator for benchmark programs Description binary-search (*) insertionsort heapsort quicksort (*) Size #LOC #fns 27 2 37 1 75 5 106 4 Knuth-Morris-Pratt 110 Boyer-Moore 106 fixwrites(*) 270 4 3 10 (Weak+Strong) (Strong) time(sec) mem(Mb) time(sec) mem(Mb) 0.48 7.8 0.4 7.5 2.9 7.9 2 7.8 26.2 9.8 23.0 9.6 2m 13.2 overflow Proves Property √ √ √ √ 9.4 33.7 4.2m 4 12 28 8.6 10.4 26.5 7.9 28.8 > 75m 8.6 10.8 > 75M Efficient Strongly Relational Polyhedral Analysis 125 Acknowledgments. Many thanks to Mr. Aaron Bradley for implementing the πVC front end, the reviewers for their incisive comments and to the developers of the PPL [2] and GLPK [15] libraries. References 1. Bagnara, R., Hill, P. M., Ricci, E., and Zaffanella, E. Precise widening operators for convex polyhedra. In Static Analysis Symposium (2003), vol. 2694 of LNCS, Springer–Verlag, pp. 337–354. 2. Bagnara, R., Ricci, E., Zaffanella, E., and Hill, P. M. Possibly not closed convex polyhedra and the Parma Polyhedra Library. In SAS (2002), vol. 2477 of LNCS, Springer–Verlag, pp. 213–229. 3. Bardin, S., Finkel, A., Leroux, J., and Petrucci, L. FAST: Fast accelereation of symbolic transition systems. In Computer-aided Verification (July 2003), vol. 2725 of LNCS, Springer–Verlag. 4. Besson, F., Jensen, T., and Talpin, J.-P. Polyhedral analysis of synchronous languages. In Static Analysis Symposium (1999), vol. 1694 of LNCS, pp. 51–69. 5. Chvátal, V. Linear Programming. Freeman, 1983. 6. Clarisó, R., and Cortadella, J. The octahedron abstract domain. In Static Analysis Symposium (2004), vol. 3148 of LNCS, Springer–Verlag, pp. 312–327. 7. Cousot, P., and Cousot, R. Static determination of dynamic properties of programs. In Proceedings of the Second International Symposium on Programming (1976), Dunod, Paris, France, pp. 106–130. 8. Cousot, P., and Cousot, R. Abstract Interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In ACM Principles of Programming Languages (1977), pp. 238–252. 9. Cousot, P., and Halbwachs, N. Automatic discovery of linear restraints among the variables of a program. In ACM POPL (Jan. 1978), pp. 84–97. 10. Dantzig, G. B. Programming in Linear Structures. USAF, 1948. 11. Dor, N., Rodeh, M., and Sagiv, M. CSSV: Towards a realistic tool for statically detecting all buffer overflows in C. In Proc. PLDI’03 (2003), ACM Press. 12. Halbwachs, N., Proy, Y., and Roumanoff, P. Verification of real-time systems using linear relation analysis. Formal Methods in System Design 11 (1997), 157–185. 13. Jeannet, B. The convex polyhedra library New Polka. Available online from http://www.irisa.fr/prive/Bertrand.Jeannet/newpolka.html. 14. Karr, M. Affine relationships among variables of a program. Acta Inf. 6 (1976), 133–151. 15. Makhorin, A. The GNU Linear Programming Kit (GLPK), 2000. Available online from http://www.gnu.org/software/glpk/glpk.html. 16. Manna, Z. Mathematical Theory of Computation. McGraw-Hill, 1974. 17. Miné, A. A new numerical abstract domain based on difference-bound matrices. In PADO II (May 2001), vol. 2053 of LNCS, Springer–Verlag, pp. 155–172. 18. Sankaranarayanan, S., Sipma, H. B., and Manna, Z. Scalable analysis of linear systems using mathematical programming. In Verification, Model-Checking and Abstract-Interpretation (VMCAI 2005) (January 2005), vol. 3385 of LNCS. 19. Simon, A., King, A., and Howe, J. M. Two variables per linear inequality as an abstract domain. In LOPSTR (2003), vol. 2664 of Lecture Notes in Computer Science, Springer, pp. 71–89. Environment Abstraction for Parameterized Verification Edmund Clarke1 , Muralidhar Talupur1, and Helmut Veith2 1 2 Carnegie Mellon University, Pittsburgh, PA, USA Technische Universität München, Munich, Germany Abstract. Many aspects of computer systems are naturally modeled as parameterized systems which renders their automatic verification difficult. In wellknown examples such as cache coherence protocols and mutual exclusion protocols, the unbounded parameter is the number of concurrent processes which run the same distributed algorithm. In this paper, we introduce environment abstraction as a tool for the verification of such concurrent parameterized systems. Environment abstraction enriches predicate abstraction by ideas from counter abstraction; it enables us to reduce concurrent parameterized systems with unbounded variables to precise abstract finite state transition systems which can be verified by a finite state model checker. We demonstrate the feasibility of our approach by verifying the safety and liveness properties of Lamport’s bakery algorithm and Szymanski’s mutual exclusion algorithm. To the best of our knowledge, this is the first time both safety and liveness properties of the bakery algorithm have been verified at this level of automation. 1 Introduction We propose a new method for the verification of concurrent parameterized systems which combines predicate abstraction [21] with ideas from counter abstraction [29]. In predicate abstraction, the memory state of a system is approximated by a tuple of Boolean values which indicate whether certain properties (“predicates”) of the memory state hold or not. For example, instead of keeping all 64 bits for two integer variables x, y, predicate abstraction may just track the Boolean value of the predicate x > y. Counter abstraction, in contrast, is specifically tailored for concurrent parameterized systems which are composed of finite state processes: for each possible state s of a single finite state process, the abstract state contains a counter Cs which denotes the number of processes currently in state s. Thus, the process identities are abstracted away in counter abstraction. It can be argued that counter abstraction constitutes a very natural abstraction mechanism for protocols. In practice, the counters in counter abstraction are themselves abstracted in that they are cut off at value 2. Counter abstraction however has two main problems: first, it works only for finite state systems, and second, it assumes perfect symmetry, i.e., each process is identical  This research was sponsored by the the National Science Foundation (NSF) under grants no. CCR-9803774 and CCR-0121547. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of NSF. The third author was also supported by the EU GAMES Network. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 126–141, 2006. c Springer-Verlag Berlin Heidelberg 2006  Environment Abstraction for Parameterized Verification 127 to every other process in every aspect. Well-known algorithms such as Lamport’s bakery algorithm are not directly amenable to counter abstraction: the bakery algorithm has an infinite state space due to an unbounded integer variable, and also an inherent asymmetry due to its use of process id’s. In this paper, we will address the two disadvantages of counter abstraction by incorporating the idea of counter abstraction into a new form of predicate abstraction: since the state space is infinite, we do not count the processes in a given state as in traditional counter abstraction, but instead we count the number of processes satisfying a given predicate. Note that the counters which we actually use in this paper are cut off at the value 1; such degenerated counters are evidently tantamount to existential quantifiers. Counting abstraction, too, usually needs only counters in the range [0..2]. Since our abstraction maintains the state of one process explicitly, a range of [0..1] for each counter suffices. Our new form of abstraction is also different from common predicate abstraction frameworks: Since the number of processes in a concurrent parameterized system is unbounded, the system does not have a single infinite-state model, but an infinite sequence of models which increase in complexity. Moreover, since the individual processes can have local data variables with unbounded range (e.g. integers), each of these models is an infinite-state system by itself. Thus, computing the abstract transition relation is a non-trivial task. Note that the predicates need to reflect the properties of a set of concurrent processes whose cardinality we do not know at verification time. To encode the necessary information into the abstract model, we will introduce environment predicates. Environment Predicates. We use an asynchronous model which crucially distinguishes between the finite control variables of a process and the unbounded data variables of a process. The control variables are used to model the finite control of the processes while the data variables can be read by other processes in order to modify their own data variables. The variables can be used in the guards of the other processes, thus facilitating a natural communication among the processes.1 Figure 1 visualizes the intuition underlying environment abstraction. The grey box on the left hand side represents a concrete state of a system with 16 concurrent processes. The different colors of the disks/processes represent the internal states of the processes, i.e., the positions of the program counter. The star-shaped graph on the right hand side of Figure 1 represents an abstract state. The abstract state contains one distinguished process – called the reference process x – which is at the center of the star. In this example, the reference process x represents process 1 of the concrete state. The disks on the circumference of the star represent the environment of the reference process. Intuitively, the goal of the abstraction is to embed the reference process x of the abstract state into an abstract environment as rich as the environment which process 1 has in the concrete state. Thus, the abstract state represents the concrete state “from the point of view of process 1.” 1 We assume that transitions involving global conditions are treated atomically, i.e., while a process is evaluating e.g. a guard, no other process makes any transition. This simplification – which we shall call the atomicity assumption further on – is implicit in other works on parameterized verification, see [3, 5, 6, 29]. 128 E. Clarke, M. Talupur, and H. Veith Fig. 1. Abstraction Mapping To describe the environment of a process, we need to consider the relationships which can hold between the data variables of two processes. We can graphically indicate a specific relationship between any two processes by a corresponding arrow between the processes; the form of the arrow (full, dashed, etc.) determines which relationship the two processes have. In the figure, we assume that we have only two relationships R1 , R2 . For example, R1 (x, y) might say that the local variable t of process x has the same value as local variable t in process y, while R2 (x, y) might say that t has different values in processes x and y. Relationship R1 is indicated by a full arrow, and R2 is indicated by a dashed arrow. For better readability, not all relationships between the 16 processes are drawn. More precisely, the environment of the reference process is described as follows: we enumerate all cases how the data variables in the reference process can relate to the data variables in a different process, as well as all possible program counter values which the other process can take. In our example, we have 2 relationships R1 , R2 and 4 program counter positions, giving 8 different environment conditions. Therefore, the abstract state contains 8 environment processes on the circumference. For each of these 8 environment conditions, we indicate by the absence or presence of a bar, if this environment condition is actually satisfied by some process in the concrete state. For example, the dashed arrow from process 1 to the vertically striped process 2 in the concrete state necessitates a dashed arrow from x to a vertically striped process in the abstract state. Similarly, since there is no full arrow starting at process 1 in the concrete state, all full arrows in the abstract state have a bar. An environment predicate is a quantified formula which indicates the presence or absence of an environment condition for the reference process. We will give a formal definition of these notions in Section 4. Note that a single abstract state in general represents an infinite number of concrete states. Moreover, a given concrete state gives rise to several abstract states, each of which is induced by choosing a different possible reference process. For example, the concrete state in Figure 1 may induce up to 16 abstract states, one for each process. Existential Abstraction for Parameterized Systems. We construct an abstract system by a variant of existential abstraction. We include an abstract transition if in some concrete instance of the parameterized system we can find a concrete transition between concrete states which match the abstract states with respect to the same reference process. The abstract model obtained by environment abstraction is a sound abstraction which preserves both safety and liveness properties. Environment Abstraction for Parameterized Verification 129 In this paper we use a simple input language which is general enough to describe most practically relevant symmetric protocols, and to demonstrate the underlying principles of our abstraction. We believe that our abstraction method can be naturally generalized for additional constructs as well. To handle liveness we augment the abstract model using an approach suggested by [29]. Note that in contrast to the indexed predicates method [24, 25], our approach constructs an abstract transition system, instead of computing the set of reachable abstract states. This feature of our approach is crucial for verifying liveness properties. Tool Chain and Experiments. Our approach provides an automated tool chain in the tradition of model checking. 1. The user feeds the protocol described in our language to the verification tool. 2. The environment abstraction tool extracts a finite state model from the process description, and puts the model in NuSMV format. 3. NuSMV verifies the specified properties. Using the abstraction method described here, we have been able to verify automatically the safety and liveness properties of two well known mutual exclusion algorithms, namely Lamport’s Bakery algorithm [26] and Szymanski’s algorithm [31]. While safety and liveness properties of Szymanski’s algorithm have been automatically verified with atomicity assumption by Baukus et al [5], this is the first time both safety and liveness of Lamport’s bakery algorithm have been verified (with the atomicity assumption) at this level of automation. 2 Discussion of Related Work Verification of parameterized systems is well known to be undecidable [2, 30]. Many interesting approaches to this problem have been developed over the years, including the use of symbolic automata-based techniques [1, 23, 8, 7], invariant based techniques [3, 28], predicate abstraction [24], or exploiting symmetry [11, 14, 17, 15, 16]. Some of the earliest work on verifying parameterized systems includes works by Browne et al [9], German and Sistla [20], Emerson and Sistla [16]. In the rest of this section, we will concentrate on the work which is closest to our approach. Counter Abstraction [4, 12, 13, 29, 20] is an intuitive method to use on parameterized systems. Pnueli et al [29] who coined the term counter abstraction show how systems composed of symmetric and finite state processes can be handled automatically. Protocols which either break symmetry by exploiting knowledge of process id’s or which have infinite state spaces however require manual intervention. Thus, the verification of Szymanski’s and the Bakery protocol in [29] requires manual introduction of new variables. The method also makes assumptions on the atomicity of guards. The Invisible Invariants method was introduced in a series [28, 3, 18, 19] of papers. The idea behind this technique is to find an invariant for the parameterized system by examining concrete systems for low valuations of the parameter(s). The considered system model is powerful enough to model various mutual exclusion and cache coherence protocols which do not need unbounded integer variables. In [3], a modified version 130 E. Clarke, M. Talupur, and H. Veith of the bakery algorithm is verified: the original bakery algorithm is modified to eliminate unbounded integer variables. In contrast, the method proposed in the current paper can handle the original bakery protocol without such modifications. The authors of [3] implicitly make the assumption that guards are evaluated atomically. The Indexed Predicates method [24, 25] is a new form of predicate abstraction for infinite state systems. This method relies on the observation that complex invariants are built of simple indexed predicates, i.e., predicates which have free index variables. By choosing a set of indexed predicates appropriately one can use a modified form of predicate abstraction to find a system invariant. In comparison to the above mentioned work, this method makes weaker atomicity assumptions. Our method is also based on predicate abstraction; in fact, the notion of a reference process can be viewed as an “index” in the indexed predicates framework. However, the contribution we make is very different: (i) We focus on concurrent parameterized systems which enables us to use the specific and precise technique of environment abstraction. Our abstraction method exploits and reflects the structure of communicating protocols. (ii) In the indexed predicates approach there is no notion of an abstract transition relation. Thus, their approach, which is tailored for computing reachable states, works only for safety properties. In our framework, the abstract model does have a transition relation, and we can verify liveness properties as well as safety properties. (iii) The indexed predicate technique requires manual intervention or heuristics for choosing appropriate predicates. In contrast, our technique is automatic. A method pioneered by Baukus et al [5] models an infinite class of systems by a single WS1S system which is then abstracted into a finite state system. While this is an automatic technique it cannot handle protocols such as the Bakery algorithm which have unbounded integer variables. The global conditions are assumed to be atomic. The inductive method of [27] based on model checking is applied to verify both safety and liveness of the Bakery algorithm, notably without assuming atomicity. This approach however is not automatic: the user is required to provide lemmas and theorems to prove the properties under consideration. Our approach in contrast is fully automatic. Regular model checking [8] is an interesting verification technique very different from ours. It is based on modeling systems using regular languages. This technique is applicable to a wide variety of systems but it requires the user to express systems in terms of regular languages which is a non-trivial process and requires user ingenuity. Henzinger et. al. [22] also consider the problem of unbounded number of threads but the system model they consider is different. The communication between threads occurs through shared variables, whereas in our case, each process can look at the state of the other processes. In summary, automatic methods such as the WS1S method and counter abstraction are restricted in the systems they can handle and make use of the atomicity assumption. In contrast, the methods which make no or weaker assumptions about atomicity tend to require user intervention, either in the form of providing appropriate predicates or in the form of lemmas and theorems which lead to the final result. In this paper, we assume atomicity of guards and describe a method which can handle well known mutual exclusion protocols such as the Bakery and Szymanski’s protocols automatically. Importantly, our method is able to abstract and handle unbounded integer variables. To the Environment Abstraction for Parameterized Verification 131 best of our knowledge, this is the first time that the Bakery algorithm (under atomicity assumption) has been verified automatically. The method of environment abstraction described here has a natural extension which eliminates the atomicity assumption. This extension of our method, which will be described in future work, has been used to verify the Bakery algorithm and Szymanski’s protocol without any restrictions. 3 System Model Parameterized Systems. We consider asynchronous systems composed of an unbounded number of processes which communicate via shared variables. Each process can modify its own variables, but has only read access to the variables of the other processes. Each process has two sets of variables: the control variables F = {f1 , . . . , fc }, where each fi has a finite, constant range and the data variables U = {u1 , . . . ud }, where each ui is an unbounded integer. Intuitively, the two sets of variables serve different purposes: (i) The control variables in F determine the internal control state of the process. As they have a finite domain, the variables in F amount to the finite control of the process. (ii) The data variables in U contain actual data which can be read by other processes to calculate their own data variables. All processes run the same protocol P . For a given protocol P , a system consisting of K processes running P will be denoted by P(K). Thus, the number K of processes is the system parameter. We will write P(N) to denote the infinite collection P(2), P(3), . . . of systems. To be able to refer to the state of individual processes in a system P(K) we will assume that each process has a distinct and fixed process id from the range [1..K]. We will usually refer to processes and their variables via their process id’s. In particular, fa [i] and ub [i] denote the variables fa and ub of the process with id i. The set of local states of a process i is then naturally given by the different valuations of the tuple f1 [i], . . . , fc [i], u1 [i], . . . , ud [i]. The global state of system P(K) is given by a tuple L1 , . . . , LK , where each Li is the local state of process i. The initial state of each process is given by a fixed valuation of the local state variables. Note that all processes in a system P(K) are identical except for their id s. Thus, the process id’s are the only means to break the symmetry between the processes. A process can use the reserved expression slf to refer to its own process id. When a protocol text contains the variables fa or ub without explicit reference to a process id, then this stands for fa [slf] and ub [slf] respectively. A concrete valuation of the variables in F determines the control state of a process. Without loss of generality, we can assume for simplicity that F has only one variable pc which determines the control state of a process. Thus, in the rest of the paper F = {pc}, although in program texts we may take the freedom to use more than one finite range control variable. A formula of the form pc = const is called a control assignment. The range of pc is called the set of control locations. Guarded Transitions and Update Transitions. We will describe the transition relation of the processes in terms of two basic constructs, guarded transitions for the finite control, and the more complicated update transitions for modifying data variables. A guarded transition has the form 132 E. Clarke, M. Talupur, and H. Veith if ∀otr = slf.G(slf, otr) then goto pc = L2 else goto pc = L3 pc = L1 : or shorter L1 : if ∀otr = slf.G(slf, otr) then goto L2 else goto L3 where L1 , L2 , L3 are control locations. In the guard ∀otr = slf.G(slf, otr) the variable otr ranges over the process id’s of all other processes. The condition G(slf, otr) is any formula involving the data variables of processes slf, otr and the pc variable of otr. The semantics of a guarded transition is straightforward: in control location L1 , the process evaluates the guard and changes to control location L2 or L3 accordingly. Update transitions are needed to describe protocols such as the Bakery algorithm where a process computes a data value depending on all values which it can read from other processes. For example, the Bakery algorithm has to compute the maximum of a certain data variable (the “ticket variable”) in all other processes. Thus, we define an update transition to have the general form L1 : for all otr = slf goto L2 if T (slf, otr) then uk := φ(otr) where L1 and L2 are control assignments, and T (slf, otr) is a condition involving data variables of processes slf, otr. The semantics of the update transition is best understood in an operational manner: In control location L1 , the process scans over all the other processes (in nondeterministically chosen order), and for each process otr checks if the formula T (slf, otr) is true. In this case, the process changes the value of its data variable uk according to uk := φ(otr), where φ(otr) is an expression involving variables of process otr. Thus, the variable uk can be reassigned multiple times within a transition. Finally, the process changes to control location L2 . We assume that both guarded and update transitions are atomic, i.e., during their execution no other process makes a move. Example 1. As an example of a protocol written in this language, consider a parameterized system P(N) where each process P has one finite variable pc : {1, 2, 3} representing a program counter, one unbounded/integer variable t : Int, and executes the following program: 1 : goto 2 2 : if ∀otr = slf.t[slf] = t[otr] then goto 3 3 : t := t[otr] + 1; goto 1 The statement 1 : goto 2 is syntactic sugar for pc = 1 : if ∀otr = slf.true then goto pc = 2 else goto 1 Similarly, 3 : t := t[otr] + 1; goto = 1 is syntactic sugar for pc = 3 : if ∀otr = slf.true then t := t[otr] + 1 goto pc = 1. This example also illustrates that most commonly occurring transition statements in protocols can be written in our input language. 2 Environment Abstraction for Parameterized Verification 133 Note that we have not specified the operations and predicates which are used in the conditions and assignments. Essentially, this choice depends on the protocols and the power of the decision procedures used. For the protocols considered in this paper, we need linear order and equality on data variables as well as incrementation, i.e., addition by 1. The full version of the paper [10] contains the descriptions of the Bakery algorithm and Szymanski’s algorithm in terms of our language. 4 Environment Abstraction In this section, we describe the principal framework of environment abstraction. In Section 5 we will discuss how to actually compute abstract models for the class of parameterized systems introduced in the previous section. Both tasks are non-trivial, as we need to construct a finite abstract model which reflects the properties of P(K) for all K ≥ 1. We shall write P(N) |= Φ to say that P(K) |= Φ for all parameters K > 1. Given a specification Φ and a system P(N), we will construct an abstract model P A and an abstract specification ΦA such that P A |= ΦA implies P(N) |= Φ. The converse does not have to hold, i.e., the abstraction is sound but not complete. We will first describe how to construct the abstract model. We have already informally visualized and discussed the abstraction concept using Figure 1. More formally, our approach is best understood by viewing the abstract state as a description ∆(x) of the computing environment of a reference process x. Since x is a variable, we can then meaningfully say that the description ∆(x) holds true or false for a concrete process. We write g |= ∆(p) to express that in a global state g, ∆(x) holds true for the process p. An abstract state (i.e., a description ∆(x)) contains (i) detailed information about the current internal state of x and (ii) information about the internal states of other processes and their relationship to x. Since the number of other processes is not fixed, we can either count the number of processes which are in a given relationship to x, or, as in the current paper, keep track of the existence of such processes. Technically, our descriptions reuse the predicates which occur in the control statements of the protocol description. Let S be the number of control locations in the program P . The internal state of a process x can be described by a predicate of the form pc[x] = L where L ∈ {1..S} is a control location. In order to describe the relations between the data variables of different processes we collect all predicates EP 1 (x, y), . . . , EP r (x, y) which occur in the guards of the program. From now on we will refer to these predicates as the inter-predicates of the program. Since in most practical protocols, synchronization between processes involves only one or two data variables, the number of inter-predicates is usually quite small. The relationship between a process x and a process y is now described by a formula of the form . Ri (x, y) = ±EP 1 (x , y) ∧ . . . ∧ ±EP r (x , y) where ±EP i stands for EP i or its negation ¬EP i . It is easy to see that there are 2r possible relationships R1 (x, y), . . . , R2r (x, y) between x and y. In the example of Figure 1, the two relationship predicates R1 , R2 are visualized by full and dashed arrows. 134 E. Clarke, M. Talupur, and H. Veith Fact 1. The relationship conditions R1 (x, y), . . . , R2r (x, y) are mutually exclusive. Before we explain the descriptions ∆(x) in detail, let us first describe the most important building blocks for the descriptions which we call environment predicates. An environment predicate expresses that for process x we can find another process y which has a given relationship to process x and a certain internal state. The environment predicates thus have the form ∃y.y = x ∧ Ri (x, y) ∧ pc[y] = j. An environment predicate says the following: there exists a process y different from x whose relationship to x is described by the EP predicates in Ri , and whose internal state is j. There are T := 2r × S different environment predicates; we name them E1 (x), . . . , ET (x), and their quantifier-free matrices E1 (x, y), . . . , ET (x, y). Note that each Ek (x, y) has the form y = x ∧ Ri (x, y) ∧ pc[y] = j. Fact 2. If an environment process y satisfies an environment condition Ei (x, y), then it cannot simultaneously satisfy any other environment condition Ej (x, y), i = j. Fact 3. Let Ei (x, y) be an environment condition and G(x, y) be a boolean formula over the inter-predicates EP 1 (x, y), . . . , EP r (x, y) and predicates of the form pc[y] = L. Then either Ei (x, y) ⇒ G(x, y) or Ei (x, y) ⇒ ¬G(x, y). We are ready to return to the descriptions ∆(x). A description ∆(x) has the format pc[x] = i ∧ ±E1 (x) ∧ ±E2 (x) ∧ · · · ∧ ±ET (x), where i ∈ [1..S]. (∗) Intuitively, a description ∆(x) therefore gives detailed information on the internal state of process x, and how the other processes are related to process x. Note the correspondence of ∆(x) to the abstract state in Figure 1: the control location i determines the color of the central circle, and the Ej determine the processes surrounding the central one. We will now represent descriptions ∆(x) by tuples of values, as usual in predicate abstraction. The possible descriptions (∗) only differ in the value of the program counter pc[x] and in where they have negations in front of the E predicates. Denoting negation by 0 and absence of negation by 1, every description ∆(x) can be identified with a tuple pc, e1 , . . . eT  where pc is a control location, and each ei is a boolean variable. From this point of view, we have two ways to speak about abstract states: as descriptions ∆(x), and as tuples pc, e1 , . . . , eT . Thinking of abstract states as descriptions is more intuitive in the conceptual phase of this work, while the latter approach is more in line with traditional predicate abstraction, and closer to the algorithms we use. Example 2. Consider again the protocol shown in Example 1. There is only one inter. predicate EP 1 (x, y) = t[x] = t[y]. Thus we have two possible relationship conditions . . R1 (x, y) = t[x] = t[y] and R2 (x, y) = t[x] = t[y]. Consequently, we have 6 different environment predicates: . . E4 (x) = ∃y = x.pc[y] = 1 ∧ R2 (x, y) E1 (x) = ∃y = x.pc[y] = 1 ∧ R1 (x, y) . . E5 (x) = ∃y = x.pc[y] = 2 ∧ R2 (x, y) E2 (x) = ∃y = x.pc[y] = 2 ∧ R1 (x, y) . . E3 (x) = ∃y = x.pc[y] = 3 ∧ R1 (x, y) E6 (x) = ∃y = x.pc[y] = 3 ∧ R2 (x, y) Environment Abstraction for Parameterized Verification 135 The abstract state then is a 7-tuple pc, e1 , . . . , e6  where pc refers to the internal state of the reference process x. For each i ∈ [1..6], the bit ei tells whether there is an environment process y = x such that the environment predicate Ei (x) becomes true. 2 Definition 1 (Abstract States). Given a parameterized system P(N) with control locations {1, .., S} and environment predicates E1 (x ), . . . , ET (x ), the abstract state space contains tuples pc, e1 , . . . eT , where – pc ∈ {1, .., S} denotes the control location of the reference process. – each ej is a Boolean variable corresponding to the predicate Ej (x ). Since the concrete system P(K) contains K processes, a state s ∈ P(K) can give rise to up to K different abstract states, one for every different choice of the reference process. Definition 2 (Abstraction Mapping). Let P (K), K > 1, be a concrete system and p ∈ [1..K] be a process. The abstraction mapping αp induced by p maps a global state g of P(K) to an abstract state pc, e1 , . . . , eT  where pc = the value of pc[p] in state g and for all ej we have ej = 1 ⇔ g |= Ej (p). Definition 3 (Abstract Model). The abstract model P A is given by the transition system (S A , ΘA , ρA ) where – S A = {1, .., S} × {0, 1}T , the set of abstract states, contains all valuations of the tuple pc, e1 , . . . , eT . – ΘA , the set of initial abstract states, is the set of abstract states ŝ such that there exists a concrete initial state s of a concrete system P(K), K > 1, such that there exists a concrete process p with αp (s) = ŝ. – ρA ⊆ S A × S A is a transition relation on the abstract states defined as follows: There is a transition from abstract state ŝ1 to abstract state ŝ2 if there exist (i) a concrete system P(K), K > 1 with a process p (ii) a concrete transition from concrete state s1 to s2 in P(K) such that αp (s1 ) = ŝ1 and αp (s2 ) = ŝ2 . 4.1 Specifications We will now focus on the properties that we want to verify. By a one process control condition we mean a boolean formula over expressions of the form pc[x] = L, L ∈ {1, .., S}. By a two process control condition we mean a boolean formula over expressions of the form pc[x] = L1 , pc[y] = L2 , where L1 , L2 ∈ {1, .., S}. Definition 4 (Two-Indexed Safety Properties). A two-indexed safety property is a specification ∀x , y. AGφ(x , y), where x , y are variables which refer to distinct processes, and φ(x , y) is a two process control condition. Definition 5 (Liveness Properties). A liveness property is a specification of the form ∀x . AG(φ(x ) → Fψ(x )), where φ(x ) and ψ(x ) are one process control conditions. 136 E. Clarke, M. Talupur, and H. Veith A standard example of a two-indexed safety property is the mutual exclusion property ∀x , y. AG¬(pc[x ] = crit ∧ pc[y] = crit), where crit is the control location of the critical section. An example of a liveness property is the formula ∀x . AG (pc[x ] = try → F pc[x ] = crit) which expresses that a process gets to enter the critical section if it wants to. We first show how to abstract a formula φ(x , y) without any temporal operators. The abstraction φA of φ(x , y) is a predicate over the abstract states that is satisfied by those and only those abstract states ŝ for which there exists a system P(K), K > 1 with a process p, and a global state s of P(K) such that αp (s) = ŝ and ∀q = p. (s |= φ(p, q)). Intuitively, we treat x as the reference process and y as an environment process and find which abstract states correspond to the concrete formula φ(x , y). Similarly, for a single index property φ(x ), its abstraction φA is the predicate that is satisfied by those and only those abstract states ŝ for which there exists a system P(K), K > 1, with a process p and a global state s of P(K) such that αp (s) = ŝ and s |= φ(p). Now we can define the abstract specifications: The abstraction of a two-indexed safety property ∀x , y. AGφ(x , y) is the formula AGφA . The abstraction of a singleindexed liveness property ∀x . AG (φ(x ) → Fψ(x )) is the formula AG (φA → Fψ A ). Theorem 1 (Soundness of Abstraction). Let P(N) be a parameterized system and P A be an over-approximation of its abstraction P A . Given any two-indexed safety or single-indexed liveness property Φ and its abstraction ΦA we have P A |= ΦA implies P(N) |= Φ. 4.2 Extensions for Fairness and Liveness The abstract model that we have described, while sound, might be too coarse in practice to be able to verify liveness properties. The reason is two fold: (i) Spurious Infinite Paths. Our abstract model may have infinite paths which cannot occur in any concrete system. This happens when two concrete states s1 and s2 , where s1 transitions to s2 , both map to the same abstract state ŝ, leading to a selfloop involving ŝ. Such a self-loop can lead to a spurious infinite path which hinders the verification of liveness properties. (ii) Fairness Conditions. Liveness properties are usually expected to hold under some fairness conditions. A typical example of a fairness condition is that every process x must leave the critical section a finite time after entering it. This is expressed formally by the fairness condition pc[x] = crit. In this paper we will consider fairness conditions pc[x] = L, where L is a control location. Liveness properties are then expected to hold on fair paths: an infinite path in a concrete system P(K), K ≥ 1 is fair only if the fairness condition pc[i] = L holds for each process i infinitely often. To handle these situations, we adapt a method developed by Pnueli et al. [29] in the context of counter abstraction to our environment abstraction. To this end, we augment our abstract model by adding new Boolean variables fromi , toi for every i ∈ [1..T ]. Thus Environment Abstraction for Parameterized Verification 137 our new abstract states are tuples pc, e1 , . . . , eT , from1 , . . . , fromT , to1 , . . . , toT . We will now briefly describe this extension. Intuitively, the new from, to variables keep track of the immediate history of an abstract state, that is, the last step by which the abstract state was reached. The variable fromi is true if a process y having satisfied Ei (x, y) in the previous state does not satisfy Ei (x, y) in the new state. Similarly, the variable toi is true if the active process having satisfied Ej (x, y), j = i in the previous state satisfies Ei (x, y) in the new state. To eliminate the spurious infinite paths arising from loops described in item (i) above, we add for each i ∈ [1..T ] a compassion condition [29] fromi , toi  which says If fromi = true holds infinitely often in a path, then toi = true must hold infinitely often as well. Let us now turn to item (ii). Given a concrete fairness condition of the form pc[x] = L, the corresponding abstract fairness condition for the reference process is given by pc = L. Moreover, we introduce fairness conditions ¬(fromi = false ∧ ei = 1) for all those environments Ei (x, y) which require process y to be in control location L, i.e., those Ei (x, y) which contain the subformula pc[y] = L. For such an environment condition Ei , the fairness condition ¬(fromi = false ∧ ei = 1) excludes the case that there are environment processes satisfying Ei (x, y) which never move. For a more detailed explanation and proofs please consult the full version. 5 Computing the Abstract Model In our implementation, we consider protocols in which all inter-predicates EP i (x, y) have the form t[x] ≺ t[y] where ≺∈ {<, >, =} and t is a data variable.2 Thus, each local process compares its own variables only with their counterparts in other processes. Most real protocols satisfy this condition. Our results however do not depend on this particular choice of inter-predicates. Computing the abstract transition relation is evidently complicated by the fact that there is an infinite number of concrete systems. To get around this problem, we consider each concrete transition statement of the program separately and over-approximate the set of abstract transitions it can lead to. Their union will be our abstract transition relation. A concrete transition can either be a guarded transition or an update transition. Each transition can be executed by the reference process or one of the environment processes. Thus there are four cases to consider: Active process is . . . guarded transition update transition . . . reference process Case 1 Case 2 . . . environment process Case 3 Case 4 In this section we will consider Case 1, that is, the reference process executing the guarded transition. Computing the abstract transition in other cases is similar in spirit but quite lengthy. We refer the reader to the full version [10] of this paper for a more 2 The incrementation operation occurs only on the right hand side of assignments in update transitions. 138 E. Clarke, M. Talupur, and H. Veith detailed description of how we compute the abstract initial condition and the abstract transition relation. Let us now turn to Case 1 in detail, and consider the guarded transition L1 : if ∀otr = slf.G(slf, otr) then goto L2 else goto L3 . (∗) Suppose the reference process is executing this guarded transition statement. If at least one of the environment processes contradicts the guard G then the reference process transitions to control location L3 , i.e., the else branch. Otherwise, the reference process goes to L2 . We will now formalize the conditions under which the if and else branches are taken. . Definition 6 (Blocking Set for Reference Process). Let G = ∀otr = slf.G(slf, otr) be a guard. We say that an environment condition Ei (x , y) blocks the guard G if Ei (x , y) ⇒ ¬G(x , y). The set B x(G ) of all indices i such that Ei (x , y) blocks G is called the blocking set of the reference process for guard G . Note that by Fact 3, either Ei (x , y) ⇒ ¬G(x , y) or Ei (x , y) ⇒ G(x , y) for every environment Ei (x , y). The intuitive idea behind the definition is that B x(G ) contains the indices of all environment conditions which enforce the else branch. We will now explain how to represent the guarded transition (∗) in the abstract model: we introduce an abstract transition from sˆ1 = pc, e1 , .., eT , from1 , .., fromT , to1 , .., toT  to sˆ2 = pc , e1 , .., eT , from1 , .., fromT , to1 , .., toT  if 1. pc = L1 , i.e., the reference process is in location L1 , 2. one of the following two conditions holds: – If Branch: ∀i ∈ B x (G ). (ei = 0) and pc = L2 , i.e., the guard is true and the reference process moves to control state L2 . – Else Branch: ¬∀i ∈ B x (G ). (ei = 0) and pc = L3 , i.e., the guard is false and the reference process moves to control state L3 . 3. all the variables from1 , .., fromT and to1 , .., toT are false, expressing that none of the environment processes changes its state. Thus, in order to compute the abstract transition we just need to find the blocking set B x (G ). This task is easy for predicates involving only linear order. 6 Experimental Results We have implemented a prototype of our abstraction method in JAVA. As argued above, our implementation handles protocols in which all the predicates appearing in the guards involve only {<, >, =}. Thus, in this preliminary implementation, the decision problems that arise during the abstraction are simple and are handled by our abstraction program internally. We verified the safety and liveness properties of Szymanski’s mutual exclusion protocol and Lamport’s bakery algorithm. These two protocols have an intricate combinatorial structure and have been used widely as benchmarks for parameterized verification. For safety properties, we verified that no two processes can be Environment Abstraction for Parameterized Verification Inter-preds Intra-preds Reachable states Safety Szymanski 1 8 O(214 ) 0.1s Bakery 3 5 O(2146 ) 68.55s 139 Liveness 1.82s 755.0s Fig. 2. Running Times present in the critical section at the same time. For liveness, we verified the property that if a process wishes to enter the critical section then it eventually will. We used the NuSMV model checker to verify the finite abstract model. The model checking times are shown in Figure 2. The abstraction time is negligible, less than 0.1s. Figure 2 also shows the number of predicates and the size of the reachable state space as reported by NuSMV. All experiments were run on a 2.4 GHz Pentium machine with 512 MB main memory. 7 Conclusion We have enriched predicate abstraction by ideas from counter abstraction to develop a new framework for verifying parameterized systems. We have applied this method to verify, under the atomicity assumption, the safety and liveness properties of two well known mutual exclusion protocols. The main focus of this paper was the verification of small but very intricate systems. In these systems, the challenge is to handle the tightly inter-twined execution of an unbounded number of processes and to maintain predicates which are spanning multiple processes. At the heart of our approach lies a notion of abstraction – environment abstraction – which describes the status of a concurrent system from the point of view of a single process. In addition to safety properties, environment abstraction naturally allows to verify fairness properties as well. The framework presented in this paper is a specific instance of environment abstraction tailored for distributed mutual exclusion algorithms. The general approach can be naturally extended in several ways: – In this paper, the internal state of a process is described by a control location pc = L. In a more general framework, the state of a process can be described using additional predicates which relate the different data variables of one process. This extension is quite straightforward but omitted from the current paper for the sake of simplicity. – We have also extended the method to deal with systems in which there is a central process in addition to the K local processes. This extension allows us to handle directory based cache coherence protocols and will be reported in future work. – The most important improvement of our results concerns the elimination of the atomicity assumption as to achieve automated protocol verification in a nonsimplified setting for the first time. We recently have reached this goal by an extension of environment abstraction. We will report these results in future work. To conclude, we want to emphasize that viewing a concurrent system from the point of view of a single process closely matches the reasoning involved in designing a dis- 140 E. Clarke, M. Talupur, and H. Veith tributed algorithm. We therefore believe that environment abstraction naturally yields powerful system abstractions. Acknowledgments The authors are grateful to the anonymous referees and Ken McMillan, and Lenore Zuck for discussions and comments which helped to improve the presentation of this paper. References 1. P. A. Abdulla, B. Jonsson, M. Nilsson, and J. d’Orso. Regular model-checking made simple and efficient. In Proc. 13th International Conference on Concurrency Theory (CONCUR), 2002. 2. K. Apt and D. Kozen. Limits for automatic verification of finite state concurrent systems. Information Processing Letters, 15:307–309, 1986. 3. T. Arons, A. Pnueli, S. Ruah, and L. Zuck. Parameterized verification with automatically computed inductive assertions. In Proc. 13th Intl. Conf. Computer Aided Verification (CAV), 2001. 4. T. Ball, S. Chaki, and S. Rajamani. Verification of multi-threaded software libraries. In ICSE, 2001. 5. K. Baukus, S. Bensalem, Y. Lakhnech, and K. Stahl. Abstracting WS1S systems to verify parameterized networks. In Proc. TACAS, 2000. 6. K. Baukus, Y. Lakhnech, and K. Stahl. Verification of parameterized protocols. In Journal of Universal of Computer Science, 2001. 7. B. Boigelot, A. Legay, and P. Wolper. Iterating transducers in the large. In 15th Intern. Conf. on Computer Aided Verification (CAV’03). LNCS, Springer-Verlag, 2003. 8. A. Bouajjani, B. Jonsson, M. Nilsson, and T. Touili. Regular model checking. In 12th Intern. Conf. on Computer Aided Verification (CAV’00). LNCS, Springer-Verlag, 2000. 9. M. C. Browne, E. M. Clarke, and O. Grumberg. Reasoning about networks with many identical finite state processes. Information and Computation, 81:13–31, 1989. 10. E. Clarke, M. Talupur, and H. Veith. Environment abstraction for parameterized verification. In www.cs.cmu.edu/∼tmurali/vmcai06.ps. 11. E. M. Clarke, T. Filkorn, and S. Jha. Exploiting symmetry in temporal model checking. In Proc. 5th Intl. Conf. Computer Aided Verification (CAV), 1993. 12. G. Delzanno. Automated verification of cache coherence protocols. In Computer Aided Verification 2000 (CAV 00), 2000. 13. A. E. Emerson and V. Kahlon. Model checking guarded protocols. In Eighteenth Annual IEEE Symposium on Logic in Computer Science (LICS), pages 361–370, 2003. 14. E. A. Emerson, J. Havlicek, and R. Trefler. Virtual symmetry. In 15th Annual IEEE Symposium on Logic in Computer Science (LICS), 2000. 15. E. A. Emerson and A. Sistla. Utilizing symmetry when model-checking under fairness assumptions: An automata theoretic approach. TOPLAS, 4, 1997. 16. E. A. Emerson and A. P. Sistla. Symmetry and model checking. In Proc. 5th Intl. Conf. Computer Aided Verification (CAV), 1993. 17. E. A. Emerson and R. Trefler. From asymmetry to full symmetry. In CHARME, 1999. 18. Y. Fang, N. Piterman, A. Pnueli, and L. Zuck. Liveness with incomprehensible ranking. In Proc. VMCAI, 2004. Environment Abstraction for Parameterized Verification 141 19. Y. Fang, N. Piterman, A. Pnueli, and L. Zuck. Liveness with invisible ranking. In Conference on Tools and Algorithms for Construction and Analysis of Systems (TACAS), 2004. 20. S. M. German and A. P. Sistla. Reasoning about systems with many processes. Journal of the ACM, 39, 1992. 21. S. Graf and H. Saidi. Construction of abstract state graphs with PVS. In O. Grumberg, editor, Proc. CAV 97), volume 1254, pages 72–83. Springer Verlag, 1997. 22. T. Henzinger, R. Jhala, and R. Majumdar. Race checking with context inference. In Proceedings of the International Conference on Programming Language Design and Implementation (PLDI), 2004. 23. Y. Kesten, O. Maler, M. Marcus, A. Pnueli, and E. Shahar. Symbolic model checking with rich assertional languages. In Proc. CAV’97, volume 1254 of LNCS, pages 424–435. Springer, June 1997. 24. S. K. Lahiri and R. Bryant. Constructing quantified invariants. In Conference on Tools and Algorithms for Construction and Analysis of Systems (TACAS), 2004. 25. S. K. Lahiri and R. Bryant. Indexed predicate discovery for unbounded system verification. In Proc. 16th Intl. Conf. Computer Aided Verification (CAV), 2004. 26. L. Lamport. A new solution of Dijkstra’s concurrent programming problem. Communications of the ACM, 17(8):453–455, 1974. 27. K. L. McMillan, S. Qadeer, and J. B. Saxe. Induction in compositional model checking. In Conference on Computer Aided Verfication, pages 312–327, 2000. 28. A. Pnueli, S. Ruah, and L. Zuck. Automatic deductive verification with invisible invariants. In Conference on Tools and Algorithms for Construction and Analysis of Systems (TACAS), 2001. 29. A. Pnueli, J. Xu, and L. Zuck. Liveness with (0, 1, ∞) counter abstraction. In Computer Aided Verification 2002 (CAV 02), 2002. 30. I. Suzuki. Proving properties of a ring of finite state machines. Information Processing Letters, 28:213–214, 1988. 31. B. K. Szymanski. A simple solution to Lamport’s concurrent programming problem with linear wait. In Proc International Conference on Supercomputing Systems, 1988. Error Control for Probabilistic Model Checking Håkan L.S. Younes Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA Abstract. We introduce a framework for expressing correctness guarantees of model-checking algorithms. The framework allows us to qualitatively compare different solution techniques for probabilistic model checking, both techniques based on statistical sampling and numerical computation of probability estimates. We provide several new insights into the relative merits of the different approaches. In addition, we present a new statistical solution method that can bound the probability of error under any circumstances by sometimes reporting undecided results. Previous statistical solution methods could only bound the probability of error outside of an “indifference region.” 1 Introduction Probabilistic model checking, based on the model-checking paradigm pioneered by Clarke and Emerson [4], is a technique for automated verification of stochastic processes. Given a model of a stochastic process, for example a Markov chain, the model-checking task is to determine whether the model satisfies some property Φ. For instance, consider a queuing system with random (according to some distribution) arrivals and departures. We may ask whether the probability is at most 0.5 that the queue will become full in the next hour of operation. This is an example of a probabilistic time-bounded property. Techniques for verifying such properties for stochastic discrete-event systems without nondeterminism are the focus of this paper. Algorithms for probabilistic model checking of time-bounded properties come in two flavors: numerical [3, 12] and statistical [19, 8, 15, 17]. The former rely on numerical algorithms for probability computations, while the latter use statistical sampling and discrete-event simulation to assess the validity of probabilistic properties. Some insights into the relative merits of the two approaches are given by Younes et al. [18]. Yet, a direct comparison is difficult because numerical and statistical techniques provided quite different correctness guarantees. Furthermore, conflicting claims have been made about the benefits of competing statistical solution methods. Hérault et al. [8] state that their solution method, based on statistical estimation, is better than the method of Younes and Simmons [19], based on hypothesis testing, because the sample size of the former method is known exactly. Sen et al. [15] provide empirical data that seem to  Supported in part by the US Army Research Office (ARO), under contract no. DAAD190110485. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 142–156, 2006. c Springer-Verlag Berlin Heidelberg 2006  Error Control for Probabilistic Model Checking 143 suggest that hypothesis testing with fixed-size samples consistently outperforms sequential hypothesis testing (the latter being advocated by Younes et al. [18]). This paper is an attempt to set the record straight regarding the relative merits of different solution methods for probabilistic model checking. We establish a framework for expressing the correctness guarantees of model-checking algorithms (Sect. 3). Section 4 shows how to connect the truncation error, , of numerical methods with the parameter δ (the half-width of the “indifference region”) of statistical methods. We conclude that numerical and statistical solution methods can, indeed, be interpreted as solving the same problem. Statistical solution methods are simply randomized algorithms for the same problems that numerical methods solve. We are also able to prove that statistical estimation, when used for probabilistic model checking, reduces to hypothesis testing with fixed-size samples. It follows that Younes and Simmons’ solution method never needs to use a larger sample size than Hérault et al.’s estimation-based method, and it will often use a much smaller sample size to achieve the same correctness guarantees. Our framework for error control also helps us understand the results of Sen et al., which seem to contradict results presented by Younes [17]. The second contribution of this paper is a new statistical method for probabilistic model checking. Current statistical solution methods provide bounds for the probability of error only when a formula holds (or does not hold) with some margin. Our new method bounds the probability of error under all circumstances. This is accomplished by permitting an undecided result. The idea of undecided results for statistical solution methods is due to Sen et al. [15], but they did not provide any mechanisms for bounding the probability of producing an undecided result (or even an incorrect result, for that matter). Section 5 shows, for the first time, how to bound the probability of undecided and incorrect results for any time-bounded formula, including conjunctions of probabilistic statements and nested probabilistic statements. Section 6 discusses the computational complexity of statistical solution methods in general. A brief empirical evaluation of the new statistical solution method is provided in Sect. 7. 2 Probabilistic Model Checking This section describes stochastic discrete-event systems (without nondeterminism), which is the class of models that we consider for probabilistic model checking. A logic, UTSL, for expressing properties of such models is introduced. We describe the semantics of UTSL and of UTSLδ , the latter being a relaxation of the former logic that permits practical model-checking algorithms. 2.1 Stochastic Discrete-Event Systems A stochastic discrete-event system is any stochastic process that can be thought of as occupying a single state for a duration of time before an event causes an instantaneous state transition to occur. The canonical example is a queuing system, with the state being the number of items currently in the queue. The state changes at the occurrence of an arrival or departure event. 144 H.L.S. Younes The evolution of a stochastic discrete-event system over time is captured by a trajectory. The trajectory of a stochastic discrete-event system is piecewise constant and can be represented as a sequence σ = {s0 , t0 , s1 , t1 , . . .}, with si ∈ S and ti > 0. Let  0 if i = 0 , (1) Ti = i−1 if i > 0 j=0 tj so that Ti is the time at which state si is entered and ti is the duration of time for which the process remains in si before an event triggers a transition to state si+1 . It is assumed that limi→∞ Ti = ∞. This implies that only a finite number of events can trigger in a finite interval of time, which is a reasonable assumption for any physical process (cf. [1]). A measurable stochastic discrete-event system is a triple M = S, T, µ, where S is the state space, T is the time domain (ZZ ∗ for discrete-time models and [0, ∞) for continuous-time models), and µ is a probability measure over sets of trajectories with common prefix. A prefix of σ = {s0 , t0 , s1 , t1 , . . .} is a k sequence σ≤τ = {s0 , t0 , . . . , sk , tk }, with si = si for all i ≤ k, i=0 ti = τ , ti = ti for all i < k, and tk < tk . Let Path(σ≤τ ) denote the set of trajectories with common prefix σ≤τ . This set must be measurable for probabilistic model checking to have meaning and its measure is determined by µ. The exact definition of µ depends on the structure of the process. Baier et al. [3] provide a definition for continuous-time Markov chains and Younes [17] discusses the construction of a probability space for trajectories of stochastic discrete-event systems in general (see also, Segala’s [14] definition of trace distributions). 2.2 UTSL: The Unified Temporal Stochastic Logic A stochastic discrete-event system is a triple S, T, µ. We assume a factored representation of S, with a set of state variables SV and a value assignment function V(s, x) providing the value of x ∈ SV in state s. The domain of x is the set Dx = s∈S V (s, x) of possible values that x can take on. We define the syntax of UTSL for a factored stochastic discrete-event system M = S, T, µ, SV , V  as    Φ ::= x ∼ v  ¬Φ  Φ ∧ Φ  P θ [Φ U I Φ] , where x ∈ SV , v ∈ Dx , ∼ ∈ {≤, =, ≥}, θ ∈ [0, 1],  ∈ {≤, ≥}, and I ⊂ T . Additional UTSL formulae can be derived in the usual way. For example, ⊥ ≡ (x = v) ∧ ¬(x = v) for some x ∈ SV and v ∈ Dx , ≡ ¬⊥, Φ ∨ Ψ ≡ ¬(¬Φ ∧ ¬Ψ), Φ → Ψ ≡ ¬Φ ∨ Ψ, and P< θ [ϕ] ≡ ¬P≥ θ [ϕ]. The standard logic operators have their usual meaning. P θ [ϕ] asserts that the probability measure over the set of trajectories satisfying the path formula ϕ is related to θ according to . Path formulae are constructed using the temporal path operator U I (“until”). The path formula Φ U I Ψ asserts that Ψ becomes true t ∈ I time units into the future while Φ holds continuously prior to t. The validity of a UTSL formula is inductively defined as follows: Error Control for Probabilistic Model Checking M, {s0 , t0 , . . . , sk , tk } |= x ∼ v M, σ≤τ |= ¬Φ 145 if V (sk , x) ∼ v if M, σ≤τ |= Φ M, σ≤τ |= Φ ∧ Ψ if (M, σ≤τ |= Φ) ∧ (M, σ≤τ |= Ψ) M, σ≤τ |= P θ [ϕ] if µ({σ ∈ Path(σ≤τ ) | M, σ, τ |= ϕ})  θ  M, σ, τ |= Φ U I Ψ if ∃t ∈ I. (M, σ≤τ +t |= Ψ)   ∧ ∀t ∈ T. (t < t) → (M, σ≤τ +t |= Φ) The semantics of Φ U I Ψ requires that Φ holds continuously, i.e. at every point in time, along a trajectory until Ψ is satisfied. For Markov chains, it is sufficient to consider time points at which state transitions occur. The semantics of UTSL therefore coincides with the semantics for Hansson and Jonsson’s [7] PCTL interpreted over discrete-time Markov chains and Baier et al.’s [3] CSL interpreted over continuous-time Markov chains. For non-Markovian models, however, the validity of Φ or Ψ may vary over time in the same state if these formulae contain probabilistic operators. Because of this, the statistical solution method for probabilistic model checking presented in this paper is restricted to Markov chains for properties with nested probabilistic operators. Without nesting, the method does not rely on this restriction. We typically want to know whether a property Φ holds for a model M if execution starts in a specific state s. A model-checking problem M, s, Φ has an affirmative answer if and only if M, {s, 0} |= Φ. 2.3 UTSLδ : UTSL with Indifference Regions Consider the model-checking problem M, s, P θ [ϕ] and let p be the probability measure for the set of trajectories that start in s and satisfy ϕ. If p is “sufficiently close” to θ, then it is likely to make little difference to a user whether or not P θ [ϕ] is reported to hold by a model-checking algorithm. To formalize this idea, we introduce UTSLδ as a relaxation of UTSL. With each formula of the form P θ [ϕ], we associate an indifference region centered around θ with half-width δ. If |p − θ| < δ, then the truth value of P θ [ϕ] is undetermined for UTSLδ ; otherwise, it is the same as for UTSL. The formal semantics of UTSLδ is given by a satisfaction relation |≈δ and an unsatisfaction relation |≈δ⊥ . For standard logic formulae, |≈δ replaces |= and |≈δ⊥ replaces |=. For probabilistic formulae we have the following rules: M, σ≤τ |≈δ P≥ θ [ϕ] if µ({σ ∈ Path(σ≤τ ) | M, σ, τ |≈δ ϕ}) ≥ θ + δ M, σ≤τ |≈δ⊥ P≥ θ [ϕ] if µ({σ ∈ Path(σ≤τ ) | M, σ, τ |≈δ⊥ ϕ}) ≥ 1 − (θ − δ) M, σ≤τ |≈δ P≤ θ [ϕ] if µ({σ ∈ Path(σ≤τ ) | M, σ, τ |≈δ ϕ}) ≤ θ − δ M, σ≤τ |≈δ⊥ P≤ θ [ϕ] if µ({σ ∈ Path(σ≤τ ) | M, σ, τ |≈δ⊥ ϕ}) ≤ 1 − (θ + δ) A model-checking problem M, s, Φ may very well belong to neither of the two relations |≈δ and |≈δ⊥ , in which case the problem is considered “too close to call.” 146 3 H.L.S. Younes Error Control This section discusses error control for model-checking algorithms in general terms. The discussion establishes ideal conditions for the correctness guarantees of a model-checking algorithm. These conditions are used as a point of reference in later sections when we discuss error control in practical algorithms for probabilistic model checking. Given a model-checking problem M, s, Φ and a model-checking algorithm A, let M, s  Φ represent the fact that Φ is accepted as true by A and M, s ⊥ Φ that Φ is rejected as false by A (for the remainder of the paper we will leave out M from relations for the sake of brevity). Ideally, we would like the probability to be low that A produces an incorrect answer. More precisely, the probability of a false negative should be at most α and the probability of a false positive at most β, as expressed by the following conditions: Pr[s ⊥ Φ | s |= Φ] ≤ α Pr[s  Φ | s |= Φ] ≤ β (2) (3) In addition, the probability should be low that A does not produce a definite answer. Let s ⊥  Φ denote that A is undecided. We add Pr[s ⊥  Φ] ≤ γ (4) to represent this requirement. Finally, A should always terminate with one of the three possible answers (accept, reject, or undecided): Pr[(s  Φ) ∨ (s ⊥ Φ) ∨ (s ⊥  Φ)] = 1 (5) A model-checking algorithm that satisfies (2) through (5) is guaranteed to produce a correct answer with probability at least 1 − α − γ when Φ holds and 1−β−γ when Φ does not hold. To make these probabilities high, α, β, and γ need to be low. If all three parameters are zero, then A is a deterministic algorithm for probabilistic model checking. If both α + γ and β + γ are less than 0.5, but non-zero, then A is a randomized algorithm for probabilistic model checking. Unfortunately, it is generally not possible, in practice, to satisfy all four conditions with low values for all three parameters. Next, we will discuss how these conditions are relaxed by current solution methods, and then we will present a new statistical solution method based on an alternative relaxation. 4 Current Solution Methods Current solution methods, both numerical and statistical, can be seen as relying on a relaxation of (2) and (3) to become tractable. The reference point for error is changed from UTSL to UTSLδ semantics, replacing (2) and (3) with: Pr[s ⊥ Φ | s |≈δ Φ] ≤ α (6) Pr[s  Φ | s |≈⊥ Φ] ≤ β (7) δ Error Control for Probabilistic Model Checking 4.1 147 Statistical Hypothesis Testing The predominant statistical solution method for verifying P θ [ϕ] in a single state s is based on statistical hypothesis testing. This method was first proposed by Younes and Simmons [19] and further refined by Younes [17]. The approach always produces a definite result (γ = 0). This ensures a high probability of a correct answer when s |≈δ Φ or s |≈δ⊥ Φ holds. Let Φ be P≥ θ [ϕ], let p be the probability measure of the set of trajectories that start in s and satisfy ϕ, and let Xi be Bernoulli variates with Pr[Xi = 1] = p. To verify Φ we test the hypothesis H0 : p ≥ θ + δ against the alternative hypothesis H1 : p ≤ θ − δ based on observations of Xi (the result of verifying ϕ over a sample trajectory starting in s). Note that H0 corresponds to s |≈δ Φ and H1 corresponds to s |≈δ⊥ Φ. If we take acceptance of H0 to mean acceptance of Φ as true and acceptance of H1 to mean rejection of Φ as false, then we can use acceptance sampling to verify Φ. Acceptance sampling is a well-established technique for statistical hypothesis testing. An acceptance sampling test with strength α, β guarantees that H1 is accepted with probability at most α when H0 holds and H0 is accepted with probability at most β when H1 holds. Hence, we can use such a test to satisfy (6) and (7) for the verification of Φ. Any acceptance sampling test with the prescribed strength can be used. A straightforward approach is to use a fixed number of observations n x1 , . . . , xn of the Bernoulli variates X1 , . . . , Xn and pick a constant c. If i=1 xi is greater than c, then H0 is accepted, otherwise H1 is accepted. The pair n, c is called a single sampling plan [5]. The sum of n Bernoulli variates with parameter p has a binomial distribution with cumulative distribution function c   n i F (c; n, p) = (8) p (1 − p)n−i . i i=0 Using a single sampling plan n, c we accept hypothesis H1 with probability F (c; n, p) and hypothesis H0 with probability 1 − F (c; n, p). To achieve strength α, β we need to choose n and c so that F (c; n, θ +δ) ≤ α and 1−F (c; n, θ −δ) ≤ β. For optimal performance we choose n and c so that n is minimized. There is no closed-form solution for n, in general. Younes [17] describes an algorithm based on binary search that finds an optimal single sampling plan. The sample size for a single sampling plan is fixed and therefore independent of the actual observations made. It is often possible to reduce the expected sample size required to achieve a desired test strength by taking the observations into account as they are made. This is called sequential acceptance sampling. Wald’s [16] sequential probability ratio test (SPRT) is a particularly efficient sequential test. The reduction in expected sample size, compared to a single sampling plan, is often substantial, although there is no fixed upper bound on the sample size. The SPRT is carried out as follows. At the mth stage, i.e. after making m observations x1 , . . . , xm we calculate the quantity m fm = i=1 pdm (1 − p1 )m−dm Pr[Xi = xi | p = p1 ] = 1dm , Pr[Xi = xi | p = p0 ] p0 (1 − p0 )m−dm (9) 148 H.L.S. Younes  where dm = m i=1 xi . Hypothesis H0 is accepted if fm ≤ β/(1 − α), and hypothesis H1 is accepted if fm ≥ (1 − β)/α. Otherwise, additional observations are made until either termination condition is satisfied. 4.2 Statistical Estimation An alternative statistical solution method, based on estimation instead of hypothesis testing, has been developed by Lassaigne and Peyronnet [13]. Hérault et al. [8] provide more details of this approach. As before, let Φ be P≥ θ [ϕ] and p the probability measure of the set of trajectories that start in s and satisfy ϕ. This n approach uses n observations x1 , . . . , xn to compute an estimate of p: p̃ = n1 i=1 xi . The estimate is such that Pr |p̃ − p| < δ ≥ 1 − α . (10) Using a result derived by Hoeffding [10–Theorem 1], it can be shown that  1 2 log n= (11) 2δ 2 α is sufficient to satisfy (10). If we accept Φ as true when p̃ ≥ θ and reject Φ as false otherwise, then it follows from (10) that the answer is correct with probability at least 1 − α if either s |≈δ Φ or s |≈δ⊥ Φ holds. Consequently, the verification procedure satisfies (6) and (7) with β = α. As with the solution method based on hypothesis testing, a definite answer is always generated (γ = 0). To compare the estimation-based approach with n the approach based on hypothesis testing, let c = nθ + 1 and d = np̃ = i=1 xi . It should be clear that p̃ ≥ θ ⇐⇒ d > c. This means that the estimation-based approach can be interpreted as a single sampling plan n, c. It follows that the approach proposed by Younes and Simmons [19], when using a single sampling plan, will always be at least as efficient as the estimation-based approach. Typically, it will be more efficient because: (i) the sample size is derived using the true underlying distribution, (ii) c is not restricted to be nθ + 1, and (iii) β = α can be accommodated. The last property, in particular, is important when dealing with conjunctive and nested probabilistic statements. The advantage of hypothesis testing is demonstrated in Table 1. Note, also, that the SPRT often can be used to improve efficiency even further for the approach based on hypothesis testing. 4.3 Numerical Transient Analysis To verify the formula P θ [ϕ] in some state s we can compute p—the probability measure of the set of trajectories that start in s and satisfy ϕ—numerically and test if p  θ holds. For time-bounded properties (ϕ = Φ U [0,τ ] Ψ), which are the focus of this paper, such numerical computation is primarily feasible for Markov chains. Let M be a continuous-time Markov chain. First, as initially proposed by Baier et al. [2], the problem is reduced to transient analysis of a modified Markov Error Control for Probabilistic Model Checking 149 Table 1. Sample sizes for estimation and optimal single sampling plan (δ = 10−2 ) θ 0.5 0.5 0.5 0.9 0.9 0.9 α 10−2 10−8 10−8 10−2 10−8 10−8 β 10−2 10−2 10−8 10−2 10−2 10−8 nest 26,492 95,570 95,570 26,492 95,570 95,570 nopt nest /nopt 13,527 1.96 39,379 2.43 78,725 1.21 4,861 5.45 13,982 6.84 28,280 3.38 chain M , where all states in M satisfying ¬Φ ∨ Ψ have been made absorbing. Now, p is equal to the probability of occupying a state satisfying Ψ at time τ in model M . This probability can be computed using a technique called uniformization, originally proposed by Jensen [11]. Let Q be the generator matrix of M , q = maxi −qii , and P = I + Q/q. Then p can be expressed as follows [3]: p = µ0 · ∞  k=0 e−q·τ (q · τ )k k P · χΨ k! (12) Here, µ0 is a 0-1 row vector with a 1 in the column for the initial state s and χΨ is a 0-1 column vector with a 1 in each row corresponding to a state that satisfies Ψ. In practice, the infinite summation in (12) is truncated by using the techniques of Fox and Glynn [6], so that the truncation error is bounded by . If p̃ is the computed probability, then p̃ ≤ p ≤ p̃ + . It follows that by accepting P θ [ϕ] as true if p̃ + /2  θ and rejecting the formula as false otherwise, the numerical solution method satisfies (6) and (7) with δ = /2 and α = β = 0. As with the statistical solution methods, a definite answer is always given (γ = 0). This shows that numerical and statistical solution methods for probabilistic model checking can, indeed, be viewed as solving the same problem, i.e. UTSLδ model checking rather than UTSL model checking. Statistical solution methods are truly randomized algorithms for UTSLδ model checking. When using uniformization to verify P≥ θ [Φ U [0,τ ] Ψ], it is actually possible to know when we cannot make an informed decision. If we accept the formula as true when p̃ ≥ θ, reject it as false when p̃ +  < θ, and report “undecided” otherwise, then (2) and (3) can be satisfied with α = β = 0. This alternative implementation of the numerical solution method no longer satisfies (4). That δ condition is replaced by Pr[s ⊥  Φ | (s |= Φ) ∨ (s |≈⊥ Φ)] = 0, with δ = , for P≥ θ [ϕ] without nested probabilistic operators, and δ δ Pr[s ⊥  Φ | (s |≈ Φ) ∨ (s |≈⊥ Φ)] = 0 (13) for an arbitrary formula Φ. The use of undecided results with numerical methods for probabilistic model checking has been suggested by Hermanns et al. [9], although it is not clear if any tool implements this approach. The leading tool for probabilistic model checking, PRISM [12], does not produce undecided results. 150 5 H.L.S. Younes Statistical Solution Method with Undecided Results Existing statistical solution methods provide no meaningful error bounds if neither s |≈δ Φ nor s |≈δ⊥ Φ holds. This section presents a new statistical solution method that satisfies (2) and (3), so whenever a definite result is given the probability of error is bounded. We accomplish this by allowing an undecided result with some probability. The goal is to replace (4) with δ δ Pr[s ⊥  Φ | (s |≈ Φ) ∨ (s |≈⊥ Φ)] ≤ γ . 5.1 (14) Probabilistic Operator Without Nesting Let Φ be P≥ θ [ϕ] without nested probabilistic operators (P≤ θ [ϕ] is analogous). To satisfy (2), (3), and (14) simultaneously using a sample of size n we pick two constants c0 and c1 such that 0 ≤ c1 < c0 < n and the following conditions hold: F (c1 ; n, θ) ≤ α 1 − F (c1 ; n, θ − δ) ≤ γ (15) (16) 1 − F (c0 ; n, θ) ≤ β F (c0 ; n, θ + δ) ≤ γ (17) (18)  Let d = ni=1 xi . Formula Φ is accepted as true if d > c0 and rejected as false if d ≤ c1 ; otherwise (c1 < d ≤ c0 ) the result is undecided. The procedure just given can be interpreted as using two simultaneous acceptance sampling tests. The first is used to tests H0⊥ : p ≥ θ against H1⊥ : p ≤ θ − δ with strength α, γ. The second is used to tests H0 : p ≥ θ+δ against H1 : p ≤ θ with strength γ, β. H0 represents acceptance of Φ as true, H1⊥ represents rejection of Φ as false, and the remaining two hypotheses represent an undecided result. Combining the results from both tests, Φ is accepted as true if both H0 and H0⊥ are accepted, Φ is rejected as false if both H1 and H1⊥ are accepted, otherwise the result is undecided. Of course, this means that we do not need to use hypothesis testing with fixed-size samples. We could use any acceptance sampling plans with the prescribed strengths and combine their results as specified. In particular, we could use the SPRT to reduce the expected sample size. Graphical representations of two acceptance sampling tests with undecided results are shown in Fig. 1 for θ = 0.5, δ = 0.1, α = 0.04, β = 0.08, and γ = 0.1. The horizontal axis represents the number of observations and the vertical axis represents the number of positive observations. Figure 1(a) represents a sequential version of a single sampling plan with n = 232, c0 = 128, c1 = 102. The line dm = 129 is the boundary for acceptance of Φ. There is a line for rejection of Φ and two lines defining the boundary of the region that represents an undecided result. Figure 1(b) shows the corresponding decision boundaries for the SPRT. 5.2 Composite Formulae For a negation ¬Φ we have s ⊥ ⇐⇒ s ⊥  ¬Φ  Φ. Hence, if we can satisfy (14) for Φ, then we have the same bound, γ, on the probability of an undecided Error Control for Probabilistic Model Checking dm 151 dm 150 150 accept Φ ? ?? 100 50 50 reject Φ 0 ?? accept Φ 100 ? reject Φ 0 0 50 100 150 200 250 m 0 (a) Sequential single sampling plan 50 100 150 200 250 m (b) SPRT Fig. 1. Graphical representation of acceptance sampling tests result for the negation of Φ. The roles of α and β are reversed for negation (cf. Younes and Simmons [19] and Younes [17]). For a conjunction Φ∧Ψ we get the following general bound on the probability of an undecided result: δ δ Pr[s ⊥  Φ ∧ Ψ | (s |≈ Φ ∧ Ψ) ∨ (s |≈⊥ Φ ∧ Ψ)] ≤ max(γΦ + γΨ , γΦ + βΦ , 2γΨ + βΨ ) (19) In practice, the dependence on βΦ and βΨ can be disregarded. We have βΦ in (19) because Pr[s  Φ | s |≈δ⊥ Φ] ≤ Pr[s  Φ | s |= Φ] ≤ βΦ (similarly for βΨ ), but Pr[s  Φ | s |≈δ⊥ Φ] is typically negligible compared to Pr[s  Φ | s |= Φ]. Let γ  = γΦ = γΨ . Then (19) can, for all practical purposes, be replaced by δ δ  . Pr[s ⊥  Φ ∧ Ψ | (s |≈ Φ ∧ Ψ) ∨ (s |≈⊥ Φ ∧ Ψ)] ≤ 2γ (20) Consequently, if we want to ensure at most a γ probability of an undecided result for Φ ∧ Ψ, and we use the same bound for both conjuncts, then we can use γ/2 when verifying Φ and Ψ. For a conjunction of size n, the symmetric bound for each conjunct could be set to γ/n. To satisfy (2) we should choose αΦ and αΨ such that αΦ + αΨ ≤ α (cf. Younes and Simmons [19]1 ): Pr[(s ⊥ Φ) ∨ (s ⊥ Ψ) | (s |= Φ) ∧ (s |= Ψ)] ≤ Pr[s ⊥ Φ | s |= Φ] + Pr[s ⊥ Ψ | s |= Ψ] ≤ αΦ + αΨ (21) Similar to γ, we can use α/n when verifying the parts of a conjunction of size n. Unlike γ, however, this does not involve any approximation. To satisfy (3), it suffices to use the same error bound, β, for the individual conjuncts: Pr[(s  Φ) ∧ (s  Ψ) | (s |= Φ) ∨ (s |= Ψ)] ≤ max(Pr[s  Φ | s |= Φ], Pr[s  Ψ | s |= Ψ]) ≤ max(βΦ , βΨ ) 1 (22) Younes [17] gives the bound min(αΦ , αΨ ), but this is a bound only for each individual way of rejecting a conjunction as false. The result due to Younes and Simmons [19] and reproduced here bounds the probability of rejecting a conjunction in any way. 152 5.3 H.L.S. Younes Nested Probabilistic Statements We use acceptance sampling to verify probabilistic statements. The observations that are used by the acceptance sampling test correspond to the verification of a path formula, ϕ, over sample trajectories. If ϕ contains probabilistic statements, then the observations may be incorrect or undecided. We assume that ϕ can be verified with parameters αϕ , βϕ , and γϕ . This can be accomplished by treating the path formula as a large disjunction of conjunctions, as described by Younes and Simmons [19–p. 231] and Younes [17–p. 78]. It remains to show how to use the verification results for ϕ to verify a probabilistic statement, Φ = P≥ θ [ϕ], so that (2), (3), and (14) are satisfied. This can be accomplished by a single sampling plan with n, c0 , and c1 chosen to satisfy the following conditions: F (c1 ; n, θ(1 − αϕ )) ≤ α (23) 1 − F (c1 ; n, 1 − (1 − (θ − δ))(1 − γϕ − βϕ )) ≤ γ 1 − F (c0 ; n, θ + (1 − θ)βϕ ) ≤ β (24) (25) F (c0 ; n, (θ + δ)(1 − γϕ − αϕ )) ≤ γ (26) This assumes that Φ is accepted as true when more than c0 positive observations are made, Φ is rejected as false when at most c1 observations are non-positive (i.e., negative or undecided), and the result is undecided otherwise. Compared to (15) through (18) for acceptance sampling without nested probabilistic operators, the only difference is that the probability thresholds have been modified. The indifference regions of the two acceptance sampling tests have been made narrower to account for the possibility of erroneous or undecided observations. We can use the same modification with the SPRT. It should be noted that αϕ , βϕ and γϕ can be chosen independently of α, β, and γ. The choice of parameters for the verification of ϕ is restricted only by the following conditions: (θ − δ) + (1 − (θ − δ))(1 − γϕ − βϕ ) < θ(1 − αϕ ) θ + (1 − θ)βϕ < (θ + δ)(1 − γϕ − αϕ ) (27) (28) The choice of αϕ , βϕ , and γϕ can have a significant impact on performance (cf. the discussion by Younes [17] regarding the impact of observation error on performance for the standard statistical solution method). 6 Complexity of Statistical Solution Methods The time complexity of any statistical solution method for probabilistic model checking can be understood in terms of two main factors: the sample size and the length of sample trajectories. The sample size depends on the method used for verifying probabilistic statements and the desired strength. The length of trajectories depends on model characteristics and the property that is being Error Control for Probabilistic Model Checking 153 verified. An additional factor is simulation effort, which can be both model and implementation dependent. Consider the formula P θ [Φ U [0,τ ] Ψ] without nested probabilistic operators. Let q be the expected number of state transitions per time unit, let m be the simulation effort per state transition, and let N be the sample size. The time complexity of statistical probabilistic model checking for the given formula is O(q · τ · m · N ). The sample size, N , is the only factor that varies between different statistical solution methods, regardless of implementation details. If we use a single sampling plan with strength α, β and indifference region of half-width δ, then N is roughly proportional to log α and log β and inversely proportional to δ 2 [17–p. 23]. We have shown in this paper that the approach based on statistical estimation described by Hérault et al. [8] never uses a smaller sample size than a single sampling plan, given the same parameters, and often uses a much larger sample size. Using the SPRT instead of a single sampling plan can reduce the expected sample size by orders of magnitude in most cases, although the SPRT is not guaranteed always to be more efficient (this is well known in the statistics literature; Younes [17] provides examples of this in the context of model checking). The new statistical approach presented in this paper, which can produce undecided results, has the same time complexity as the old statistical solution method. Given the same α, β, and δ, the new method will require a larger sample size because it is based on acceptance sampling with indifference regions of half-width δ/2, instead of δ for the old method. Results presented by Sen et al. [15] give the impression that single sampling plans consistently outperform the SPRT. It should be noted, however, that Sen et al. manually selected the sample sizes for their single sampling plans, guided by a desire to achieve a low p-value (K. Sen, personal communication, May 20, 2004). The selected sample sizes are not sufficient to achieve the same strength as used to produce the results for the SPRT reported by Younes et al. [18], on which they base their comparison. All their empirical evaluation really proves is that a smaller sample size results in shorter verification time—which should surprise no one—but the casual reader may be misled into believing that Sen et al. have devised a more efficient statistical solution method. 7 Empirical Evaluation The performance of our new statistical solution method is similar to that of the previous statistical solution method, which has been studied and compared to the numerical approach by Younes et al. [18]. We limit the empirical evaluation in this paper to a brief study of the effect that the parameter γ has on performance. Figure 2 plots the expected sample size, as a function of the (unknown) probability p that a path formula holds, for the SPRT and a sequential single sampling plan (SSSP) with different parameter choices (θ = 0.5, δ = 0.1, α∆ = 0.004, α∇ = 0.04, β∆ = 0.008, β∇ = 0.08, γ∆ = 0.01, and γ∇ = 0.1). The expected sample size is low outside of the indifference region (gray area), especially for the SPRT, and peaks in the indifference region. Note the drop in expected sample 154 H.L.S. Younes N t (s) 800 700 600 500 400 300 200 100 0 SSSP SPRT γ = 10−2 γ =0 20 15 10 5 0 θ −δ θ θ +δ Fig. 2. Expected sample size 0 1 p 14 14.1 14. 2 14. 3 14. 4 14. 5 τ Fig. 3. Verification time size at the threshold θ where an undecided result is given with high probability. The expected sample size, as a function of p, will be similar for other parameter values, with the SPRT almost always outperforming a (sequential) single sampling plan by a wide margin. Now, consider the model-checking problem for an n-station symmetric polling system used by Younes et al. [18]. Each station has a single-message buffer and the stations are attended by a single server in cyclic order. The server begins by polling station 1. If there is a message in the buffer of station 1, the server starts serving that station. Once station i has been served, or if there is no message at station i when it is polled, the server starts polling station i+1 (or 1 if i = n). We verify the property m1 =1 → P≥ 0.5 [ U [0,τ ] poll 1 ], which states that if station 1 is full, then it is polled within τ time units with probability at least 0.5. We do so in the state where station 1 has just been polled and all buffers are full. Figure 3 plots the verification time for the symmetric polling system problem (n = 10), as a function of the formula time bound τ , averaged over 100 runs. The plot shows the verification time for the new solution method with γ = 10−2 (solid curve) and the old solution method without undecided results (dashed curve); 2δ = 10−2 and α = β = 10−2 in both cases. The verification time is lower for the standard statistical solution method, but it produces more erroneous results. Table 2 shows the number of times a certain result is produced for seven different values of τ . The new statistical solution method does not produce an erroneous result in any of the experiments, while the error probability is high for Table 2. Result distribution with (bottom) and without (top) undecided results result 14.10 14.15 14.20 14.25 14.30 14.35 14.40 accept 0 3 9 50 88 97 100 100 97 91 50 12 3 0 reject accept 0 0 0 0 32 99 100 100 99 42 1 0 0 0 reject 0 1 58 99 68 1 0 undecided Error Control for Probabilistic Model Checking 155 the standard statistical solution method for values of τ close to 14.251 (where the value of the verified property goes from false to true). Higher reliability in the results are obtained at the cost of efficiency. 8 Discussion We have presented a framework for expressing correctness guarantees of modelchecking algorithms. Using this framework, we have shown how current solution methods for probabilistic model checking are related. In particular, we have shown that Younes and Simmons’ [19] statistical solution method based on hypothesis testing has clear benefits over Hérault et al.’s [8] estimation-based approach, and that numerical and statistical solution methods can be interpreted as solving the same relaxed model-checking problems. In addition, we have presented a new statistical solution method that bounds the probability of error under all circumstances. This is accomplished by permitting undecided results, and we have shown how to guarantee bounds for the probability of getting an undecided result for any time-bounded formula. References 1. Alur, R. and Dill, D. L. A theory of timed automata. Theoretical Computer Science, 126(2):183–235, 1994. 2. Baier, C., Haverkort, B. R., Hermanns, H., and Katoen, J.-P. Model checking continuous-time Markov chains by transient analysis. In Proc. 12th International Conference on Computer Aided Verification, volume 1855 of LNCS, pages 358–372. Springer, 2000. 3. Baier, C., Haverkort, B. R., Hermanns, H., and Katoen, J.-P. Model-checking algorithms for continuous-time Markov chains. IEEE Transactions on Software Engineering, 29(6):524–541, 2003. 4. Clarke, E. M. and Emerson, E. A. Design and synthesis of synchronization skeletons using branching time temporal logic. In Proc. 1981 Workshop on Logics of Programs, volume 131 of LNCS, pages 52–71. Springer, 1982. 5. Duncan, A. J. Quality Control and Industrial Statistics. Richard D. Irwin, fourth edition, 1974. 6. Fox, B. L. and Glynn, P. W. Computing Poisson probabilities. Communications of the ACM, 31(4):440–445, 1988. 7. Hansson, H. and Jonsson, B. A logic for reasoning about time and reliability. Formal Aspects of Computing, 6(5):512–535, 1994. 8. Hérault, T., Lassaigne, R., Magniette, F., and Peyronnet, S. Approximate probabilistic model checking. In Proc. 5th International Conference on Verification, Model Checking, and Abstract Interpretation, volume 2937 of LNCS, pages 73–84. Springer, 2004. 9. Hermanns, H., Katoen, J.-P., Meyer-Kayser, J., and Siegle, M. A tool for modelchecking Markov chains. International Journal on Software Tools for Technology Transfer, 4(2):153–172, 2003. 10. Hoeffding, W. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, 1963. 156 H.L.S. Younes 11. Jensen, A. Markoff chains as an aid in the study of Markoff processes. Skandinavisk Aktuarietidskrift, 36:87–91, 1953. 12. Kwiatkowska, M., Norman, G., and Parker, D. Probabilistic symbolic model checking with PRISM: A hybrid approach. International Journal on Software Tools for Technology Transfer, 6(2):128–142, 2004. 13. Lassaigne, R. and Peyronnet, S. Approximate verification of probabilistic systems. In Proc. 2nd Joint International PAPM-PROBMIV Workshop, volume 2399 of LNCS, pages 213–214. Springer, 2002. 14. Segala, R. Modeling and Verification of Randomized Distributed Real-Time Systems. PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 1995. MIT-LCS-TR-676. 15. Sen, K., Viswanathan, M., and Agha, G. Statistical model checking of black-box probabilistic systems. In Proc. 16th International Conference on Computer Aided Verification, volume 3114 of LNCS, pages 202–215. Springer, 2004. 16. Wald, A. Sequential tests of statistical hypotheses. Annals of Mathematical Statistics, 16(2):117–186, 1945. 17. Younes, H. L. S. Verification and Planning for Stochastic Processes with Asynchronous Events. PhD thesis, Computer Science Department, Carnegie Mellon University, 2005. CMU-CS-05-105. 18. Younes, H. L. S., Kwiatkowska, M., Norman, G., and Parker, D. Numerical vs. statistical probabilistic model checking: An empirical study. In Proc. 10th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, volume 2988 of LNCS, pages 46–60. Springer, 2004. 19. Younes, H. L. S. and Simmons, R. G. Probabilistic verification of discrete event systems using acceptance sampling. In Proc. 14th International Conference on Computer Aided Verification, volume 2404 of LNCS, pages 223–235. Springer, 2002. Field Constraint Analysis Thomas Wies1 , Viktor Kuncak2, Patrick Lam2 , Andreas Podelski1 , and Martin Rinard2 1 2 Max-Planck-Institut für Informatik, Saarbrücken, Germany {wies, podelski}@mpi-inf.mpg.de MIT Computer Science and Artificial Intelligence Lab, Cambridge, USA {vkuncak, plam, rinard}@csail.mit.edu Abstract. We introduce field constraint analysis, a new technique for verifying data structure invariants. A field constraint for a field is a formula specifying a set of objects to which the field can point. Field constraints enable the application of decidable logics to data structures which were originally beyond the scope of these logics, by verifying the backbone of the data structure and then verifying constraints on fields that cross-cut the backbone in arbitrary ways. Previously, such cross-cutting fields could only be verified when they were uniquely determined by the backbone, which significantly limits the range of analyzable data structures. Field constraint analysis permits non-deterministic field constraints on crosscutting fields, which allows the verificiation of invariants for data structures such as skip lists. Non-deterministic field constraints also enable the verification of invariants between data structures, yielding an expressive generalization of static type declarations. The generality of our field constraints requires new techniques. We present one such technique and prove its soundness. We have implemented this technique as part of a symbolic shape analysis deployed in the context of the Hob system for verifying data structure consistency. Using this implementation we were able to verify data structures that were previously beyond the reach of similar techniques. 1 Introduction The goal of shape analysis [27, Chapter 4], [6, 32, 26, 2, 4, 25, 5, 22] is to verify complex consistency properties of linked data structures. The verification of such properties is important in itself, because the correct execution of the program often requires data structure consistency. In addition, the information computed by shape analysis is important for verifying other program properties in programs with dynamic memory allocation. Shape analyses based on expressive decidable logics [26, 14, 12] are interesting for several reasons. First, the correctness of such analyses is easier to establish than for approaches based on ad-hoc representations; the use of a decidable logic separates the problem of generating constraints that imply program properties from the problem of solving these constraints. Next, such analyses can be used in the context of assumeguarantee reasoning because logics provide a language for specifying the behaviors of code fragments. Finally, the decidability of logics leads to completeness properties for these analyses, eliminating false alarms and making the analyses easier to interact with. We were able to confirm these observations in the context of Hob system [21, 16] for analyzing data structure consistency, where we have integrated one such shape analysis E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 157–173, 2006. c Springer-Verlag Berlin Heidelberg 2006  158 T. Wies et al. [26] with other analyses, allowing us to use shape analysis in the context of larger programs: in particular, Hob enabled us to leverage the power of shape analysis, while avoiding the associated performance penalty, by applying shape analysis only to those parts of the program where its extreme precision is necessary. Our experience with such analyses has also taught us that some of the techniques that make these analyses predictable also make them inapplicable to many useful data structures. Among the most striking examples is the restriction on pointer fields in the Pointer Assertion Logic Engine [26]. This restriction states that all fields of the data structure that are not part of the data structure’s tree backbone must be functionally determined by the backbone; that is, such fields must be specified by a formula that uniquely determines where they point to. Formally, we have ∀x y. f (x)=y ↔ F (x, y) (1) where f is a function representing the field, and F is the defining formula for f . The relationship (1) means that, although data structures such as doubly linked lists with backward pointers can be verified, many other data structures remain beyond the scope of the analysis. This includes data structures where the exact value of pointer fields depends on the history of data structure operations, and data structures that use randomness to achieve good average-case performance, such as skip lists [30]. In such cases, the invariant on the pointer field does not uniquely determine where the field points to, but merely gives a constraint on the field, of the form ∀x y. f (x)=y → F (x, y) (2) This constraint is equivalent to ∀x. F (x, f (x)), which states that the function f is a solution of a given binary predicate. The motivation for this paper is to find a technique that supports reasoning about constraints of this, more general, form. In a search for existing approaches, we have considered structure simulation [11,9], which, intuitively, allows richer logics to be embedded into existing logics that are known to be decidable, and of which [26] can be viewed as a specific instance. Unfortunately, even the general structure simulation requires definitions of the form ∀x y. r(x, y) ↔ F (x, y) where r(x, y) is the relation being simulated. To handle the general case (2), an alternative approach therefore appears to be necessary. Field constraint analysis. This paper presents field constraint analysis, our approach for analyzing fields with general constraints of the form (2). Field constraint analysis is a proper generalization of the existing approach and reduces to it when the constraint formula F is functional. It is based on approximating the occurrences of f with F , taking into account the polarity of f , and is always sound. It is expressive enough to verify constraints on pointers in data structures such as two-level skip lists. The applicability of our field constraint analysis to non-deterministic field constraints is important because many complex properties have useful non-deterministic approximations. Yet despite this fundamentally approximate nature of field constraints, we were able to prove its completeness for some important special cases. Field constraint analysis naturally combines with structure simulation, as well as with a symbolic approach to shape analysis [33, 29]. Our presentation and current implementation are in the context of the monadic second-order logic (MSOL) of trees [13], but our results extend to other log- Field Constraint Analysis 159 ics. We therefore view field constraint analysis as a useful component of shape analysis approaches that makes shape analysis applicable to a wider range of data structures. Contributions. This paper makes the following contributions: – We introduce an algorithm (Figure 9) that uses field constraints to eliminate derived fields from verification conditions. – We prove that the algorithm is both sound (Theorem 1) and, in certain cases, complete. The completeness applies not only to deterministic fields (Theorem 2), but also to the preservation of field constraints themselves over loop-free code (Theorem 3). Theorem 3 implies a complete technique for checking that field constraints hold, if the programmer adheres to a discipline of maintaining them, for instance at the beginning of each loop. – We describe how to combine our algorithm with symbolic shape analysis [33] to infer loop invariants. – We describe an implementation and experience in the context of the Hob system for verifying data structure consistency. The implementation of field constraint analysis as part of the Hob system [21, 16] allows us to apply the analysis to modules of larger applications, with other modules analyzed by more scalable analyses, such as typestate analysis [20]. Additional details (including proofs of theorems) are in [34]. 2 Examples We next explain our field constraint analysis with a set of examples. Note that our analysis handles, as a special case, data structures that have back pointers constrained by deterministic constraints. Such data structures (for instance, doubly linked lists and trees with parent pointers [34]) have also been analyzed by previous approaches [26]. To illustrate the additional power of our analysis, we first present an example illustrating inter-data-structure constraints, which are simple and useful for high-level application properties, but are often nondeterministic. We then present a skip list example, which shows how non-deterministic field constraints arise within data structures, and illustrates how our analysis can synthesize loop invariants. 2.1 Students and Schools The data structure in our first example contains two linked lists: one containing students and one containing schools (Figure 2). Each Elem object may represent either a student or a school; students have a pointer to the school which they attend. Both students and schools use the next backbone pointer to indicate the next student or school in the relevant linked list. An invariant of the data structure is that, if an object is in the list of students, then its attends field points to an object in the schools list; that is, it cannot be null and it cannot point to an object outside the list of schools. This invariant is an example of a non-deterministic field constraint: the attends field has a non-trivial constraint, but the target of the field is not uniquely defined in terms of existing fields; instead, this field carries important new information about the school that each student attends. 160 T. Wies et al. We implement our example as a module in the Hob system [21], which allows us to specify and, using field constraint analysis, verify the desired data structure invariants and interfaces of data structure operations. In general, a module in Hob consists of three sections: 1) an implementation section (Figure 1) containing declarations of memory cell formats (in this case Elem) and executable code for data structure operations (such as addStudent); 2) a specification section (Figure 3) containing declarations of abstract sets of objects (such as ST for the set of students and SC for the set of schools in the data structure) and interfaces of data structure operations expressed in terms of these abstract sets; and 3) the abstraction section, which gives the abstraction function specifying the definition of sets (SC and ST) and specifies the representation invariants of the data structure, including field constraints (in this case, on the field attends). The implementation in Figure 1 states that the addStudent procedure adds a student st to the student list and associates it (via the attends field) with an existing school sc, which is expected to be already in the list of schools. Figure 3 presents the set interface for the addStudents procedure, consisting of a precondition (requires clause), frame condition (modifies clause), and postcondition (ensures clause). The precondition states that st must not already be in the list of students ST, and that sc must be in the list of schools. We represent parameters as sets of cardinality at most one (the null object is represented as an empty set). Therefore, the conjuncts card(st)=1 and card(sc)=1 in the precondition indicate that the parameters st and sc are not null. The modifies clause indicates that only the set of students ST and not the set of schools SC is modified. The postcondition describes the effect of the procedure: it states that the set of students ST’ after procedure execution is equal to the union (denoted +) of the set ST of student objects before procedure execution, and (the singleton containing) the given student object st. Our analysis automatically verifies that the data structure operation addStudent conforms to its interface expressed in terms of abstract sets. Proving the conformance of a procedure to such a set interface is useful for several reasons. First, the preconditions indicate to data structure clients the conditions under which it is possible to invoke operations. These preconditions are necessary to prove that the field constraint is maintained: if it was not the case that the school parameter sc belonged to the set SC of schools, the insertion would violate the representation invariant. Similarly, if it was the case that the student object st was a member of the student list, insertion would introduce cycles in the list and violate the implicit acyclicity invariant of the data structure. Also, the postcondition of addStudents communicates the fact that st is in the list after the insertion, preventing clients from executing duplicate calls to addStudents with the same student object. Finally, the set interface expresses an important partial correctness property for the addStudent procedure, so that the verification of the set interface indicates that the procedure is correctly inserting an object into the set of students. Note that the interface of the procedure does not reveal the details of procedure implementation, thanks to the use of abstract set variables. Since the set variables in the specification are abstract, any verification of a concrete implementation’s conformance to the set interface requires concrete definitions for the abstract variables. The abstraction section in Figure 4 contains this information. First, the abstraction section 161 en d attends s next next next att next proc addStudent(st:Elem; sc:Elem) { st.attends = sc; st.next = students; students = st; } attends nds atte impl module Students { format Elem { attends : Elem; next : Elem; } var students : Elem; var schools : Elem; next Field Constraint Analysis } students Fig. 1. Implementation for students example schools Fig. 2. Students data structure instance spec module Students { format Elem; specvar ST : Elem set; specvar SC : Elem set; proc addStudent(st:Elem; sc:Elem) requires card(st)=1 & card(sc)=1 & (sc in SC) & (not (st in ST)) & (not (st in SC)) modifies ST ensures ST’ = ST + st; } Fig. 3. Specification for students example abst module Students { use plugin "Bohne decaf"; ST = { x : Elem | "rtrancl (% v1 v2. next v1 = v2) students x" }; SC = { x : Elem | "rtrancl (% v1 v2. next v1 = v2) schools x" }; invariant "ALL x y. (attends x = y) --> (x ˜= null --> ((˜(rtrancl (% v1 v2. next v1 = v2) students x) --> y = null) & ((rtrancl (% v1 v2. next v1 = v2) students x) --> (rtrancl (% v1 v2. next v1 = v2) schools y))))"; invariant "ALL x. (x ˜= null & (rtrancl (% v1 v2. next v1 = v2) schools x) --> ˜(rtrancl (% v1 v2. next v1 = v2) students x))"; invariant "ALL x. (x ˜= null & (rtrancl (% v1 v2. next v1 = v2) students x) --> ˜(rtrancl (% v1 v2. next v1 = v2) schools x))"; ... } Fig. 4. Abstraction for students example 162 T. Wies et al. indicates which analysis (in this case, Bohne decaf, which implements field constraint analysis) is to be used to analyze the module. Next, the abstraction section contains definitions for abstract variables: namely, ST is defined as the set of Elem objects reachable from the root students through next fields, and SC is the set of Elem objects reachable from schools. (The function rtrancl is a higher-order function that accepts a binary predicate on objects and returns the reflexive transitive closure of the predicate.) The abstraction section also specifies data structure invariants, including field constraints. Field constraints are invariants with syntactic form ALL x y. (f x = y) --> · · · . A field f for which there is no field constraint invariant in the abstraction section is considered to be part of the data structure backbone, which has an implicit invariant that it is a union of trees. Finally, the abstraction section may contain additional invariants; our example contains invariants stating disjointness of the lists rooted at students and schools. Our Bohne analysis verifies the conformance of a procedure to its specification as follows. It first desugars the modifies clauses into a frame formula and conjoins it with the ensures clause, then replaces abstract sets in preconditions and postconditions with their definitions from the abstraction section, obtaining a procedure contract in terms of the concrete state variables (next and attends). It then conjoins representation invariants of the data structure to preconditions and postconditions. For a loop-free procedure such as addStudents, the analysis can then generate a verification condition whose validity implies that the procedure conforms to its interface. The generated verification condition for our example cannot directly be solved using decision procedures such as MONA: it contains the function symbol attends that violates the tree invariant required by MONA. Section 3 describes how our analysis uses field constraints in the verification condition to verify the validity of such verification conditions. Our analysis can successfully verify the property that for any student, attends points to some (undetermined) element of the SC set of schools. Note that this goes beyond the power of previous analyses, which required that the identity of the school pointed to by the student be functionally determined by the identity of the student. The example therefore illustrates how our analysis eliminates a key restriction of previous approaches—certain data structures exhibit properties that the logics in previous approaches were not expressive enough to capture. 2.2 Skip List We next present the analysis of a two-level skip list. Skip lists [30] support logarithmic average-time access to elements by augmenting a linked list with sublists that skip over some of the elements in the list. The two-level skip list is a simplified implementation of a skip list with only two levels: the list containing all elements, and a sublist of this list. Figure 5 presents an example two-level skip list. Our implementation uses the next field to represent the main list, which forms the backbone of the data structure, and uses the derived nextSub field to represent a sublist of the main list. We focus on the add procedure, which inserts an element into an appropriate position in the skip list. Figure 6 presents the implementation of add, which first searches through nextSub links to get an estimate of the position of the entry, then finds the entry by searching through next links, and inserts the element into the main next-linked list. Optionally, the procedure Field Constraint Analysis nextSub next root next nextSub next next next Fig. 5. An instance of a two-level skip list impl module Skiplist { format Entry { v : int; next, nextSub : Entry; } var root : Entry; proc add(e:Entry) { assume "e ˜= root"; int v = e.v; Entry sprev = root, scurrent = root.nextSub; while ((scurrent != null) && (scurrent.v < v)) { sprev = scurrent; scurrent = scurrent.nextSub; } Entry prev = sprev, current = sprev.next; while ((current != scurrent) && (current.v < v)) { prev = current; current = current.next; } e.next = current; prev.next = e; choice { sprev.nextSub = e; e.nextSub = scurrent; } | { e.nextSub = null; } } Fig. 6. Skip list implementation spec module Skiplist { format Entry; specvar S : Entry set; proc add(e:Entry) requires card(e) = 1 & not (e in S) modifies S ensures S’ = S + e’; } Fig. 7. Skip list specification abst module Skiplist { use plugin "Bohne"; S = {x : Entry | "rtrancl (% v1 v2. next v1 = v2) (next root) x"}; invariant "ALL x y. (nextSub x = y) --> ((x = null --> y = null) & (x ˜= null --> rtrancl (% v1 v2. next v1 = v2) (next x) y))"; invariant "root ˜= null"; invariant "ALL x. x ˜= null & ˜(rtrancl (% v1 v2. next v1 = v2) root x) --> ˜(EX y. y ˜= null & next y = x) & (next x = null)"; proc add { has_pred = {x : Entry | "EX y. next y = x"}; r_current = {x : Entry | "rtrancl (% v1 v2. next v1 = v2) current x"}; r_scurrent = {x : Entry | "rtrancl (% v1 v2. next v1 = v2) scurrent x"}; r_sprev = {x : Entry | "rtrancl (% v1 v2. next v1 = v2) sprev x"}; next_null = {x : Entry | "next x = null"}; sprev_nextSub = {x : Entry | "nextSub sprev = scurrent"}; prev_next = {x : Entry | "next prev = current"}; } } Fig. 8. Skip list abstraction (including invariants) 163 164 T. Wies et al. also inserts the element into nextSub list, which is modelled using a non-deterministic choice in our language and is an abstraction of the insertion with certain probability in the original implementation. Figure 7 presents a specification for add, which indicates that add always inserts the element into the set of elements stored in the list. Figure 8 presents the abstraction section for the two-level skip list. This section defines the abstract set S as the set of nodes reachable from root.next, indicating that root is used as a header node. The abstraction section contains three invariants. The first invariant is the field constraint on the field nextSub, which defines it as a derived field. Note that the constraint for this derived field is non-deterministic, because it only states that if x.nextSub==y, then there exists a path of length at least one from x to y along next fields, without indicating where nextSub points. Indeed, the simplicity of the skip list implementation stems from the fact that the position of nextSub is not uniquely given by next; it depends not only on the history of invocations, but also on the random number generator used to decide when to introduce new nextSub links. The ability to support such non-deterministic constraints is what distinguishes our approach from approaches that can only handle deterministic fields. The last two invariants indicate that root is never null (assuming, for simplicity of the example, that the state is initialized), and that all objects not reachable from root are isolated: they have no incoming or outgoing next pointers. These two invariants allow the analysis to conclude that the object referenced by e in add(e) is not referenced by any node, which, together with the precondition not(e in S), allows our analysis to prove that objects remain in an acyclic list along the next field.1 Our analysis successfully verifies that add preserves all invariants, including the non-deterministic field constraint on nextSub. While doing so, the analysis takes advantage of these invariants as well, as is usual in assume/guarantee reasoning. In this example, the analysis is able to infer the loop invariants in add. The analysis constructs these loop invariants as disjunctions of universally quantified boolean combinations of unary predicates over heap objects, using symbolic shape analysis [33,29]. These unary predicates correspond to the sets that are supplied in the abstraction section using the proc keyword. 3 Field Constraint Analysis This section presents the field constraint analysis algorithm and proves its soundness as well as, for some important cases, completeness. We consider a logic L over a signature Σ where Σ consists of unary function symbols f ∈ Fld corresponding to fields in data structures and constant symbols c ∈ Var corresponding to program variables. We use monadic second-order logic (MSOL) of trees as our working example, but in general we only require L to support conjunction, implication, and equality reasoning. 1 The analysis still needs to know that e is not identical to the header node. In this example we have used an explicit (assume "e = root") statement to supply this information. Such assume statements can be automatically generated if the developer specifies the set of representation objects of a data structure, but this is orthogonal to field constraint analysis itself. Field Constraint Analysis 165 A Σ-structure S is a first-order interpretation of symbols in Σ. For a formula F in L, we denote by Fields(F ) ⊆ Σ the set of all fields occurring in F . We assume that L is decidable over some set of well-formed structures and we assume that this set of structures is expressible by a formula I in L. We call I the simulation invariant [11]. For simplicity, we consider the simulation itself to be given by the restriction of a structure to the fields in Fields(I), i.e. we assume that there exists a decision procedure for checking validity of implications of the form I → F where F is a formula such that Fields(F ) ⊆ Fields(I). In our running example, MSOL of trees, the simulation invariant I states that the fields in Fields(I) span a forest. We call a field f ∈ Fields(I) a backbone field, and call a field f ∈ Fld \ Fields(I) a derived field. We refer to the decision procedure for formulas with fields in Fields(I) over the set of structures defined by the simulation invariant I as the underlying decision procedure. Field constraint analysis enables the use of the underlying decision procedure to reason about non-deterministically constrained derived fields. We state invariants on the derived fields using field constraints. Definition 1 (Field constraints on derived fields). A field constraint Df for a simulation invariant I and a derived field f is a formula of the form Df ≡ ∀x y. f (x) = y → FCf (x, y) where FCf is a formula with two free variables such that (1) Fields(FCf ) ⊆ Fields(I), and (2) FCf is total with respect to I, i.e. I |= ∀x. ∃ y . FCf (x, y). We call the constraint Df deterministic if FCf is deterministic with respect to I, i.e. I |= ∀x y z. FCf (x, y) ∧ FCf (x, z) → y = z . We write D for the conjunction of Df for all derived fields f . Note that Definition 1 covers arbitrary constraints on a field, because Df is equivalent to ∀x. FCf (x, f (x)). The totality condition (2) is not required for the soundness of our approach, only for its completeness, and rules out invariants equivalent to “false”. The condition (2) does not involve derived fields and can therefore be checked automatically using a single call to the underlying decision procedure. Our goal is to check validity of formulas of the form I ∧ D → G, where G is a formula with possible occurrences of derived fields. If G does not contain any derived fields then there is nothing to do, because we can answer the query using the underlying decision procedure. To check validity of I ∧ D → G, we therefore proceed as follows. We first obtain a formula G from G by eliminating all occurrences of derived fields in G. Next, we check validity of G with respect to I. In the case of a derived field f that is defined by a deterministic field constraint, occurrences of f in G can be eliminated by flattening the formula and substituting each term f (x) = y by FCf (x, y). However, in the general case of non-deterministic field constraints such a substitution is only sound for negative occurrences of derived fields, since the field constraint gives an over-approximation of the derived field. Therefore, a more sophisticated elimination algorithm is needed. 166 T. Wies et al. Eliminating derived fields. Figure 9 presents our algorithm Elim for elimination of derived fields. Consider a derived field f . The basic idea of Elim is that we can replace an occurrence G(f (x)) of f by a new variable y that satisfies FCf (x, y), yielding a stronger formula ∀y. FCf (x, y) → G(y). As an improvement, if G contains two occurrences f (x1 ) and f (x2 ), and if x1 and x2 evaluate to the same value, then we attempt to replace f (x1 ) and f (x2 ) with the same value. Elim implements this idea using the set K of triples (x, f, y) to record previously assigned values for f (x). Elim runs in time O(n2 ), where n is the size of the formula, and produces an at most quadratically larger formula. Elim accepts formulas in negation normal form, where all negation signs apply to atomic formulas. We generally assume that each quantifier Q z binds a variable z that is distinct from other bound variables and distinct from the free variables of the entire formula. The algorithm Elim is presented as acting on first-order formulas, but is also applicable to checking validity of quantifier-free formulas because it only introduces universal quantifiers which can be replaced by Skolem constants. The algorithm is also applicable to multisorted logics, and, by treating sets of elements as a new sort, to MSOL. To make the discussion simpler, we consider a deterministic version of Elim where the non-deterministic choices of variables and terms are resolved by some arbitrary, but fixed, linear ordering on terms. We write Elim(G) to denote the result of applying Elim to a formula G. S Terms(S) FV(S) Ground(S) Derived(S) − − − = − a term or a formula terms occurring in S variables free in S {t ∈ Terms(S). FV(t) ⊆ FV(S)} derived function symbols in S proc Elim(G) = elim(G, ∅) proc elim(G : formula in negation normal form; K : set of (variable,field,variable) triples): let T = {f (t) ∈ Ground(G). f ∈ Derived(G) ∧ Derived(t) = ∅} if T = ∅ do choose f (t) ∈ T choose x, y fresh first-order variables let F1 = FCf (x, y) ∧ (xi ,f,yi )∈K (x = xi → y = yi ) let G1 = G[f (t) := y] return ∀x. x = t → ∀y. (F1 → elim(G1 , K ∪ {(x, f, y)})) else case G of | Qx. G1 where Q ∈ {∀, ∃}: return Qx. elim(G1 , K) | G1 op G2 where op ∈ {∧, ∨}: return elim(G1 , K) op elim(G2 , K) | else return G Î Fig. 9. Derived-field elimination algorithm The correctness of Elim is given by Theorem 1. The proof of Theorem 1 relies on monotonicity of logical operations and quantifiers in negation normal form of a formula. (Proofs for the theorems stated here can be found in [34]). Field Constraint Analysis 167 Theorem 1 (Soundness). The algorithm Elim is sound: if I ∧ D |= Elim(G), then I ∧ D |= G. What is more, I ∧ D ∧ Elim(G) |= G. We now analyze the classes of formulas G for which Elim is complete. Definition 2 (Completeness). We say that Elim is complete for (D, G) iff I ∧ D |= G implies I ∧ D |= Elim(G). Note that we cannot hope to achieve completeness for arbitrary constraints D. Indeed, if we let D ≡ true, then D imposes no constraint whatsoever on the derived fields, and reasoning about the derived fields becomes reasoning about uninterpreted function symbols, that is, reasoning in unconstrained predicate logic. Such reasoning is undecidable not only for monadic second-order logic, but also for much weaker fragments of first-order logic [7]. Despite these general observations, we have identified two cases important in practice for which Elim is complete (Theorem 2 and Theorem 3). Theorem 2 expresses the fact that, in the case where all field constraints are deterministic, Elim is complete (and then it reduces to previous approaches [11, 26] that are restricted to the deterministic case). The proof of Theorem 2 uses the assumption that F is total and functional to conclude ∀x y. FCf (x, y) → f (x) = y, and then uses an inductive argument similar to the proof of Theorem 1. Theorem 2 (Completeness for deterministic fields). Elim is complete for (D, G) when each field constraint in D is deterministic. Moreover, I ∧ D ∧ G |= Elim(G). We next turn to completeness in the cases that admit non-determinism of derived fields. Theorem 3 states that our algorithm is complete for derived fields introduced by the weakest precondition operator to a class of postconditions that includes field constraints. This result is important in practice: a previous, incomplete, version of our elimination algorithm was not able to verify the skip list example in Section 2.2. To formalize our completeness result, we introduce two classes of well-behaved formulas: nice formulas and pretty nice formulas. Definition 3 (Nice formulas). A formula G is a nice formula if each occurrence of each derived field f in G is of the form f (t), where t ∈ Ground(G). Nice formulas generalize the notion of quantifier-free formulas by disallowing quantifiers only for variables that are used as arguments to derived fields. We can show that the elimination of derived fields from nice formulas is complete. The intuition behind this result is that if I ∧ D |= G, then for the choice of yi such that FCf (xi , yi ) we can find an interpretation of the function symbol f such that f (xi ) = yi , and I ∧ D holds, so G holds as well, and Elim(G) evaluates to the same truth value as G. Definition 4 (Pretty nice formulas). The set of pretty nice formulas is defined inductively by 1) a nice formula is pretty nice; 2) if G1 and G2 are pretty nice, then G1 ∧ G2 and G1 ∨ G2 are pretty nice; 3) if G is pretty nice and x is a first-order variable, then ∀x.G is pretty nice. Pretty nice formulas therefore additionally admit universal quantification over arguments of derived fields. We define the function skolem, which strips (top-level) universal quantifiers, as follows: 1) skolem(G1 op G2 ) = skolem(G1 ) op skolem(G2 ) 168 T. Wies et al. where op ∈ {∨, ∧}; 2) skolem(∀x.G) = G; and 3) skolem(G) = G, otherwise. Note that pretty nice formulas are closed under wlp (up to formula equivalence); the closure property follows from the conjunctivity of the weakest precondition operator. x ∈ Var − program variables f ∈ Fld − pointer fields e ∈ Exp ::= x | e.f F − quantifier free formula c ∈ Com ::= e1 := e2 | assume(F ) | assert(F ) | havoc(x) (non-deterministic assignment to x) | c1 ; c2 | c1  c2 (sequential composition and non-deterministic choice) Fig. 10. Loop-free statements of a guarded command language (see e.g. [1]) Theorem 3 implies that Elim is a complete technique for checking preservation (over straight-line code) of field constraints, even if they are conjoined with additional pretty nice formulas. Elimination is also complete for data structure operations with loops as long as the necessary loop invariants are pretty nice. Theorem 3 (Completeness for preservation of field constraints). Let G be a pretty nice formula, D a conjunction of field constraints, and c a guarded command (Figure 10). Then I ∧ D |= wlp(c, G ∧ D) iff I |= Elim(wlp(c, skolem(G ∧ D))) . Example 1. The example in Figure 11 demonstrates the elimination of derived fields using algorithm Elim. It is inspired by the skip list module from Section 2. DnextSub ≡ ∀v1 v2 . nextSub(v1 ) = v2 → next + (v1 , v2 ) G ≡ wlp((e.nextSub := root .nextSub ; e.next := root ), DnextSub ) ≡ ∀v1 v2 . nextSub[e := nextSub(root)](v1 ) = v2 → (next [e := root ])+ (v1 , v2 ) G ≡ skolem(Elim(G)) ≡ x1 = root → next + (x1 , y1 ) → x2 = v1 → next + [e := y1 ](x2 , y2 ) ∧ (x2 = x1 → y2 = y1 ) → y2 = v2 → (next [e := root ])+ (v1 , v2 ) Fig. 11. Elimination of derived fields from a pretty nice formula. The notation next + denotes the irreflexive transitive closure of predicate next (x) = y. The formula G expresses the preservation of field constraint DnextSub for updates of fields next and nextSub that insert e in front of root . The formula G is valid under the assumption that ∀x. next(x) = e. Elim first replaces the inner occurrence nextSub(root ) and then the outer occurrence of nextSub. Theorem 3 implies that the resulting formula skolem(Elim(G)) is valid under the same assumptions as the original formula G. Field Constraint Analysis 169 Limits of completeness. In our implementation, we have successfully used Elim in the context of MSOL, where we encode transitive closure using second-order quantification. Unfortunately, formulas that contain transitive closure of derived fields are often not pretty nice, leading to false alarms after the application of Elim. This behavior is to be expected due to the undecidability of transitive closure logics over general graphs [10]. On the other hand, unlike approaches based on axiomatizations of transitive closure in first-order logic, our use of MSOL enables complete reasoning about reachability over the backbone fields. It is therefore useful to be able to consider a field as part of a backbone whenever possible. For this purpose, it can be helpful to verify conjunctions of constraints using different backbones for different conjuncts. Verifying conjunctions of constraints. In our skip list example, the field nextSub forms an acyclic (sub-)list. It is therefore possible to verify the conjunction of constraints independently, with nextSub a derived field in the first conjunct (as in Section 2.2) but a backbone field in the second conjunct. Therefore, although the reasoning about transitive closure is incomplete in the first conjunct, it is complete in the second conjunct. Verifying programs with loop invariants. The technique described so far supports the following approach for verifying programs annotated with loop invariants: 1. generate verification conditions using loop invariants, pre-, and postconditions; 2. eliminate derived fields from verification conditions using Elim (and skolem); 3. decide the resulting formula using a decision procedure such as MONA [13]. Field constraints specific to program points. Our completeness results also apply when, instead of having one global field constraint, we introduce different field constraints for each program point. This allows the developer to refine data structure invariants with information specific to particular program points. Field constraint analysis and loop invariant inference. Field constraint analysis is not limited to verification in the presence of loop invariants. In combination with abstract interpretation [3] it can be used to infer loop invariants automatically. Our implementation combines field constraint analysis with symbolic shape analysis based on Boolean heaps [33, 29] to infer loop invariants that are disjunctions of universally quantified Boolean combinations of unary predicates over heap objects. Symbolic shape analysis casts the idea of three-valued shape analysis [32] in the framework of predicate abstraction. It uses the machinery of predicate abstraction to automatically construct the abstract post operator; this construction proceeds solely by deductive reasoning. The computation of the abstraction amounts to checking validity of entailments that are of the form: Γ ∧ C → wlp(c, p). Here Γ is an overapproximation of the reachable states, C is a conjunction of abstraction predicates and p is a single abstraction predicate. We use field constraint analysis to check validity of these formulas by augmenting them with the appropriate simulation invariant I and field constraints D that specify the data structure invariants we want to preserve: I ∧ D ∧ Γ ∧ C → wlp(c, p). The only problem arises from the fact that these additional invariants may be temporarily violated during program execution. To ensure applicability of the analysis, we abstract complete loop free paths in the control flow 170 T. Wies et al. graph of the program at once. This means that we only require that simulation invariants and field constraints are valid at loop cut points; effectively, these invariants are implicit conjuncts in each loop invariant. This approach supports the programming model where violations of invariants are confined to the interior of basic blocks [26]. Amortizing invariant checking in loop invariant inference. A straightforward approach for combining field constraint analysis with abstract interpretation would do a well-formedness check for global invariants and field constraints at every step of the fixed-point computation, invoking a decision procedure at each iteration step. The following insight allows us to use a single well-formedness check per basic block: the loop invariant synthesized in the presence of well-formedness check is identical to the loop invariant synthesized by ignoring the well-formedness check. We therefore speculatively compute the abstraction of the system under the assumption that both the simulation invariant and the field constraints are preserved. After the least fixed-point lfp# of the abstract system has been computed, we generate for every loop free path c with # start point c a verification condition: I ∧ D ∧ lfp# c → wlp(c, I ∧ D) where lfpc is the projection of lfp# to program location c . We then use again our Elim algorithm to eliminate derived fields and check the validity of these verification conditions. If they are all valid then the analysis is sound and the data structure invariants are preserved. Note that this approach succeeds whenever the straightforward approach would have succeeded, so it improves analysis performance without degrading precision. Moreover, when the analysis detects an error, it repeats the fixed-point computation with the simple approach to obtain an indication of the error trace. 4 Deployment as Modular Analysis Plugin We have implemented our field constraint analysis and deployed it as the Bohne and Bohne decaf 2 analysis plugins of our Hob framework [21, 16]. We have successfully verified singly-linked lists, doubly-linked lists with and without iterators and header nodes, insertion into a tree with parent pointers, two-level skip lists (Section 2.2), and our students example from Section 2. When the developer supplies loop invariants, these benchmarks, including skip list, verify in 1.7 seconds (for the doubly-linked list) to 8 seconds (for insertion into a tree). Bohne automatically infers loop invariants for insertion and lookup in the two-level skip list in 30 minutes total. We believe the running time for loop invariant inference can be reduced using ideas such as lazy predicate abstraction [8]. Because we have integrated Bohne into the Hob framework, we were able to verify just the parts of programs which require the power of field constraint analysis with the Bohne plugin, while using less detailed analyses for the remainder of the program. We have used the list data structures verified with Bohne as modules of larger examples, such as the 900-line Minesweeper benchmark and the 1200-line web server benchmark. Hob’s pluggable analysis approach allowed us to use the typestate plugin [20] and loop invariant inference techniques to efficiently verify client code, while reserving shape analysis for the container data structures. 2 Bohne decaf is a simpler version of Bohne that does not do loop invariant inference. Field Constraint Analysis 171 5 Further Related Work We are not aware of any previous work that provides completeness guarantees for analyzing tree-like data structures with non-deterministic cross-cutting fields for expressive constraints such as MSOL. TVLA [32, 24] was initially designed as an analysis framework with user-supplied transfer functions; subsequent work addresses synthesis of transfer functions using finite differencing [31], which is not guaranteed to be complete. Decision procedures [25, 18] are effective at reasoning about local properties, but are not complete for reasoning about reachability. Promising, although still incomplete, approaches include [23] as well as [28,19]. Some reachability properties can be reduced to first-order properties using hints in the form of ghost fields [15, 25]. Completeness of analysis can be achieved by representing loop invariants or candidate loop invariants by formulas in a logic that supports transitive closure [26, 36, 17, 35, 37, 33, 29]. These approaches treat decision procedure as a black box and, when applied to MSOL, inherit the limitations of structure simulation [11]. Our work can be viewed as a technique for lifting existing decision procedures into decision procedures that are applicable to a larger class of structures. Therefore, it can be incorporated into all of these previous approaches. 6 Conclusion Historically, the primary challenge in shape analysis was seen to be dealing effectively with the extremely precise and detailed consistency properties that characterize many (but by no means all) data structures. Perhaps for this reason, many formalisms were built on logics that supported only data structures with very precisely defined referencing relationships. This paper presents an analysis that supports both the extreme precision of previous approaches and the controlled reduction in the precision required to support a more general class of data structures whose referencing relationships may be random, depend on the history of the data structure, or vary for some other reason that places the referencing relationships inherently beyond the ability of previous logics and analyses to characterize. We have deployed this analysis in the context of the Hob program analysis and verification system; our results show that it is effective at 1) analyzing individual data structures to 2) verify interfaces that allow other, more scalable analyses to verify larger-grain data structure consistency properties whose scope spans larger regions of the program. In a broader context, we view our result as taking an important step towards the practical application of shape analysis. By supporting data structures whose backbone functionally determines the referencing relationships as well as data structures with inherently less structured referencing relationships, it promises to be able to successfully analyze the broad range of data structures that arise in practice. Its integration within the Hob program analysis and verification framework shows how to leverage this analysis capability to obtain more scalable analyses that build on the results of shape analysis to verify important properties that involve larger regions of the program. Ideally, this research will significantly increase our ability to effectively deploy shape analysis and other subsequently enabled analyses on important programs of interest to the practicing software engineer. 172 T. Wies et al. Acknowledgements. We thank Patrick Maier, Alexandru Salcianu, and anonymous referees for comments on the presentation of the paper. References 1. R.-J. Back and J. von Wright. Refinement Calculus. Springer-Verlag, 1998. 2. I. Balaban, A. Pnueli, and L. Zuck. Shape analysis by predicate abstraction. In VMCAI’05, 2005. 3. P. Cousot and R. Cousot. Systematic design of program analysis frameworks. In POPL, pages 269–282, 1979. 4. D. Dams and K. S. Namjoshi. Shape analysis through predicate abstraction and model checking. In VMCAI’03, volume 2575 of LNCS, pages 310–323, 2003. 5. P. Fradet and D. L. Métayer. Shape types. In Proc. 24th ACM POPL, 1997. 6. R. Ghiya and L. Hendren. Is it a tree, a DAG, or a cyclic graph? In Proc. 23rd ACM POPL, 1996. 7. E. Grädel. Decidable fragments of first-order and fixed-point logic. From prefix-vocabulary classes to guarded logics. In Proceedings of Kalmár Workshop on Logic and Computer Science, Szeged, 2003. 8. T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Lazy abstraction. In POPL, 2002. 9. N. Immerman. Descriptive Complexity. Springer-Verlag, 1998. 10. N. Immerman, A. M. Rabinovich, T. W. Reps, S. Sagiv, and G. Yorsh. The boundary between decidability and undecidability for transitive-closure logics. In Computer Science Logic (CSL), pages 160–174, 2004. 11. N. Immerman, A. M. Rabinovich, T. W. Reps, S. Sagiv, and G. Yorsh. Verification via structure simulation. In CAV, pages 281–294, 2004. 12. J. L. Jensen, M. E. Jørgensen, N. Klarlund, and M. I. Schwartzbach. Automatic verification of pointer programs using monadic second order logic. In Proc. ACM PLDI, Las Vegas, NV, 1997. 13. N. Klarlund, A. Møller, and M. I. Schwartzbach. MONA implementation secrets. In Proc. 5th International Conference on Implementation and Application of Automata. LNCS, 2000. 14. N. Klarlund and M. I. Schwartzbach. Graph types. In Proc. 20th ACM POPL, Charleston, SC, 1993. 15. V. Kuncak, P. Lam, and M. Rinard. Role analysis. In Proc. 29th POPL, 2002. 16. V. Kuncak, P. Lam, K. Zee, and M. Rinard. Implications of a data structure consistency checking system. In Int. conf. on Verified Software: Theories, Tools, Experiments (VSTTE, IFIP Working Group 2.3 Conference), Zürich, October 2005. 17. V. Kuncak and M. Rinard. Boolean algebra of shape analysis constraints. In Proc. 5th International Conference on Verification, Model Checking and Abstract Interpretation, 2004. 18. V. Kuncak and M. Rinard. Decision procedures for set-valued fields. In 1st International Workshop on Abstract Interpretation of Object-Oriented Languages (AIOOL 2005), 2005. 19. S. K. Lahiri and S. Qadeer. Verifying properties of well-founded linked lists. In POPL’06, 2006. 20. P. Lam, V. Kuncak, and M. Rinard. Generalized typestate checking for data structure consistency. In 6th International Conference on Verification, Model Checking and Abstract Interpretation, 2005. 21. P. Lam, V. Kuncak, and M. Rinard. Hob: A tool for verifying data structure consistency. In 14th International Conference on Compiler Construction (tool demo), April 2005. 22. O. Lee, H. Yang, and K. Yi. Automatic verification of pointer programs using grammar-based shape analysis. In ESOP, 2005. Field Constraint Analysis 173 23. T. Lev-Ami, N. Immerman, T. Reps, M. Sagiv, S. Srivastava, and G. Yorsh. Simulating reachability using first-order logic with applications to verification of linked data structures. In CADE-20, 2005. 24. T. Lev-Ami, T. Reps, M. Sagiv, and R. Wilhelm. Putting static analysis to work for verification: A case study. In International Symposium on Software Testing and Analysis, 2000. 25. S. McPeak and G. C. Necula. Data structure specifications via local equality axioms. In CAV, pages 476–490, 2005. 26. A. Møller and M. I. Schwartzbach. The Pointer Assertion Logic Engine. In Programming Language Design and Implementation, 2001. 27. S. S. Muchnick and N. D. Jones, editors. Program Flow Analysis: Theory and Applications. Prentice-Hall, Inc., 1981. 28. G. Nelson. Verifying reachability invariants of linked structures. In POPL, 1983. 29. A. Podelski and T. Wies. Boolean heaps. In SAS, 2005. 30. W. Pugh. Skip lists: A probabilistic alternative to balanced trees. In Communications of the ACM 33(6):668–676, 1990. 31. T. Reps, M. Sagiv, and A. Loginov. Finite differencing of logical formulas for static analysis. In Proc. 12th ESOP, 2003. 32. M. Sagiv, T. Reps, and R. Wilhelm. Parametric shape analysis via 3-valued logic. ACM TOPLAS, 24(3):217–298, 2002. 33. T. Wies. Symbolic shape analysis. Master’s thesis, Universität des Saarlandes, Saarbrücken, Germany, Sep 2004. 34. T. Wies, V. Kuncak, P. Lam, A. Podelski, and M. Rinard. On field constraint analysis. Technical Report MIT-CSAIL-TR-2005-072, MIT-LCS-TR-1010, MIT CSAIL, November 2005. 35. G. Yorsh, T. Reps, and M. Sagiv. Symbolically computing most-precise abstract operations for shape analysis. In 10th TACAS, 2004. 36. G. Yorsh, T. Reps, M. Sagiv, and R. Wilhelm. Logical characterizations of heap abstractions. TOCL, 2005. (to appear). 37. G. Yorsh, A. Skidanov, T. Reps, and M. Sagiv. Automatic assume/guarantee reasoning for heap-manupilating programs. In 1st AIOOL Workshop, 2005. A Framework for Certified Program Analysis and Its Applications to Mobile-Code Safety Bor-Yuh Evan Chang, Adam Chlipala, and George C. Necula University of California, Berkeley, California, USA {bec, adamc, necula}@cs.berkeley.edu Abstract. A certified program analysis is an analysis whose implementation is accompanied by a checkable proof of soundness. We present a framework whose purpose is to simplify the development of certified program analyses without compromising the run-time efficiency of the analyses. At the core of the framework is a novel technique for automatically extracting Coq proof-assistant specifications from ML implementations of program analyses, while preserving to a large extent the structure of the implementation. We show that this framework allows developers of mobile code to provide to the code receivers untrusted code verifiers in the form of certified program analyses. We demonstrate efficient implementations in this framework of bytecode verification, typed assembly language, and proof-carrying code. 1 Introduction When static analysis or verification tools are used for validating safety-critical code [6], it becomes important to consider the question of whether the results of the analyses are trustworthy [22, 3]. This question is becoming more and more difficult to answer as both the analysis algorithms and their implementations are becoming increasingly complex in order to improve precision, performance, and scalability. We describe a framework whose goal is to assist the developers of program analyses in producing formal proofs that the implementations and algorithms used are sound with respect to a concrete semantics of the code. We call such analyses certified since they come with machine-checkable proofs of their soundness. We also seek soundness assurances that are foundational, that is, that avoid assumptions or trust relationships that don’t seem fundamental to the objectives of users. Our contributions deal with making the development of such analyses more practical, with particular emphasis on not sacrificing the efficiency of the analysis in the process. The strong soundness guarantees given by certified program analyzers and verifiers are important when the potential cost of wrong results is significant.  This research was supported in part by NSF Grants CCR-0326577, CCF-0524784, and CCR-00225610; an NSF Graduate Fellowship; and an NDSEG Fellowship. The information presented here does not necessarily reflect the position or the policy of the Government and no official endorsement should be inferred. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 174–189, 2006. c Springer-Verlag Berlin Heidelberg 2006  A Framework for Certified Program Analysis and Its Applications 175 Moreover, the ability to check independently that the implementation of the analysis is sound allows us to construct a mobile-code receiver that allows untrusted parties to provide the code verifier. The code verifier is presented as a certified program analysis whose proof of soundness entails soundness of code verification. The main contributions of the framework we propose are the following: – We describe a methodology for translating automatically implementations of analyses written in a general-purpose language (currently, ML) into models and specifications for a proof assistant (currently, Coq). Specifically, we show how to handle those aspects of a general-purpose language that do not translate directly to the well-founded logic used by the proof assistant, such as side-effects and non-primitive recursive functions. We use the framework of abstract interpretation [12] to derive the soundness theorems that must be proved for each certified analysis. – We show a design for a flexible and efficient mobile-code verification protocol, in which the untrusted code producer has complete freedom in the safety mechanisms and compilation strategies used for mobile code, as long as it can provide a code verifier in the form of a certified analysis, whose proof of soundness witnesses that the analysis enforces the desired code-receiver safety policy. In the next section, we describe our program analysis framework and introduce an example analyzer. Then, in Sect. 3, we present our technique for specification extraction from code written in a general-purpose language. We then discuss the program analyzer certification process in Sect. 4. In Sect. 5, we present an application of certified program analysis to mobile code safety and highlight its advantages and then describe how to implement in this architecture (foundational) typed assembly language, Java bytecode verification, and proofcarrying code in Sect. 6. Finally, we survey related work (Sect. 7) and conclude (Sect. 8). 2 The Certified Program Analysis Framework In order to certify a program analysis, one might consider proving directly the soundness of the implementation of the analysis. This is possible in our framework, but we expect that an alternative strategy is often simpler. For each analysis to be certified, we write a certifier that runs after the analysis and checks its results. Then, we prove the soundness of the certifier. This approach has several important advantages. Often the certifier is simpler than the analysis itself. For example, it does not need to iterate more than once over each instruction, and it does not need all the complicated heuristics that the analysis itself might use to speed up the convergence to a fixpoint. Thus, we expect the certifier is easier to prove sound than the analysis itself. The biggest benefit, however, is that we can use an existing implementation of a program analysis as a black box, even if it is written in a language that we are not ready to analyze formally, and even if the analysis algorithm does not fit perfectly with the 176 B.-Y.E. Chang, A. Chlipala, and G.C. Necula Fig. 1. Our certified verifier architecture with the trusted code base shaded formalism desired for the certification and its soundness proofs. As an extreme example, the analysis itself might contain a model checker, while we might want to do the soundness proof using the formalism of abstract interpretation [28]. In Fig. 1, we diagram this basic architecture for the purpose of mobile-code safety. We distinguish between “installation time” activity, which occurs once per analyzer, and “verification time” activity, which occurs once per program to analyze. We choose the theory of abstract interpretation [12] as the foundation for the soundness proofs of certifiers because of its generality and because its soundness conditions are simple and well understood. We present first the requirements for the developers of certifiers, and then in Sect. 4, we describe the soundness verification. The core of the certifier is an type absval untrusted custom module con- type abs = { pc : nat; a : absval } taining an implementation of the val ainv : abs list abstract transition relation (pro- val astep : abs -> result vided by the certifier developer). datatype result = Fail | Succ of abs list The custom module of a certifier must implement the signature given adjacently. The type abs encodes abstract states, which include a program counter and an abstract value of a type that can be chosen by the certifier developer. The value ainv consists of the abstract invariants. They must at a minimum include invariants for the entry points to the code and for each destination of a jump. The function astep implements the abstract transition relation: given an abstract state at a particular instruction, compute the set of successor states, minus the states already part of ainv. The transition relation may also fail, for example when the abstract state does not ensure the safe execution of the instruction. We will take advantage of this possibility to write safety checkers for mobile-code using this framework. In our implementation and in the examples in this paper, we use the ML language for implementing custom certifiers. A Framework for Certified Program Analysis and Its Applications 177 In order to exe- fun applyWithTimeout (f: ’a -> ’b, x: ’a) : ’b = ... cute such certifiers, fun top (DonePC: nat list, ToDo: abs list) : bool = the framework pro- case ToDo of nil => true vides a trusted engine shown in Fig. 2. | a :: rest => if List.member(a.pc, DonePC) then false else The main entry point (case applyWithTimeout(astep, a) of is the function top, Fail => false invoked with a list | Succ as => top (a.pc :: DonePC, as @ ToDo)) of program counters in that have been protop (nil, ainv) cessed and a list of abstract states still Fig. 2. The trusted top-level analysis engine. The infix operto process. Termina- ators :: and @ are list cons and append, respectively. tion is ensured using two mechanisms: each invocation of the untrusted astep is guarded by a timeout, and each program counter is processed at most once. We use a timeout as a simple alternative to proving termination of astep. A successful run of the code shown in Fig. 2 is intended to certify that all of the abstract states given by ainv (i.e., the properties that we are verifying for a program) are invariant, and that the astep function succeeds on all reachable instructions. We take advantage of this latter property to write untrusted code verifiers in this framework (Sect. 5). We discuss these guarantees more precisely in Sect. 4. Example: Java Bytecode Verifier. Now we introduce an example program analyzer that requires the expressivity of a general-purpose programming language and highlights the challenges in specification extraction. In particular, we consider a certifier in the style of the Java bytecode verifier, but operating on a simple assembly language instead of bytecodes. Fig. 3 presents a fragment of this custom verifier. The abstract value is a partial map from registers to class names, with a missing entry denoting an uninitialized register.1 In the astep function, we show only the case of the memory write instruction. The framework provides the sel accessor function for partial maps, the instruction decoder instrAt, the partial function fieldOf that returns the type of a field at a certain offset, and the partial function super that returns the super class of a class. This case succeeds only if the destination address is of the form rdest + n , with register rdest pointing to an object of class cdest that has at offset n a field of type c , which must be a super class of the type of register rsrc. We omit the code for calculatePreconditions, a function that obtains some preconditions from the meta-data packaged with the .class files, and then uses an iterative fixed-point algorithm to find a good typing precondition for each program label. Each such precondition should be satisfied any time control reaches its label. This kind of algorithm is standard and well studied, in 1 In the actual implementation, registers that hold code pointers (e.g., return addresses, or dynamic dispatch addresses) are assigned types that specify the abstract state expected by the destination code block. 178 B.-Y.E. Chang, A. Chlipala, and G.C. Necula type absval = (reg, class) partialmap type abs = { pc : nat; a : absval } fun subClass (c1 : class, c2 : class) = ... fun calculatePreconditions () : abs list = ... val ainv: abs list = calculatePreconditions () fun astep (a : abs) : result = case instrAt(a.pc) of Write(rdest + n, rsrc) => (case (sel(a.a, rdest), sel(a.a, rsrc)) of (SOME cdest, SOME csrc) => (case fieldOf(cdest, n) of SOME cdest’ => if subClass(csrc, cdest’) then Succ [ { a = a.a, pc = a.pc + 1 } ] else Fail | _ => Fail) | _ => Fail) | ... Fig. 3. Skeleton of a verifier in the style of the Java bytecode verifier the context of the Java Bytecode Verifier and elsewhere, so we omit the details here. Most importantly, we will not need to reason formally about the correctness of this algorithm. 3 Specification Extraction To obtain certified program analyses, we need a methodology for bridging the gap between an implementation of the analysis and a specification that is suitable for use in a proof assistant. An attractive technique is to start with the specification and its proof, and then use program extraction supported by proof assistants such as Coq or Isabelle [29] to obtain the implementation. This strategy is very proofcentric and while it does yield a sound implementation, it makes it hard to control non-soundness related aspects of the code, such as efficiency, instrumentation for debugging, or interaction with external libraries. Yet another alternative is based on verification conditions [15, 17], where each function is first annotated with a pre- and postcondition, and the entire program is compiled into a single formula whose validity implies that the program satisfies its specification. Such formulas can make good inputs to automated deduction tools, but they are usually quite confusing to a human prover. They lose much of the structure of the original program. Plus, in our experience, most auxiliary functions in a program analyzer do good jobs of serving as their own specifications (e.g., the subClass function). Since it is inevitable that proving soundness will be sufficiently complicated to require human guidance, we seek an approach that maintains as close of a correspondence between the implementation and its model as possible. For non- A Framework for Certified Program Analysis and Its Applications 179 fun subClass ( depth : nat, c1 : class, c2 : class) : bool option = c1 = c2 orelse (case super c1 of NONE => SOME false | SOME sup => if depth = O then NONE else subClass’( depth-1, sup,c2)) Fig. 4. Translation of the subClass function. The boxed elements are added by our translation. recursive purely functional programs, we can easily achieve the ideal, as the implementation can reasonably function as its own model in a suitable logic, such as that of the Coq proof assistant. This suggests that we need a way to handle imperative features, and a method for dealing with non-primitive recursive functions. In the remainder of this section, we give an overview of our approach. More detail can be found in the companion technical report [9]. Handling Recursion. We expect that all invocations of the recursive functions used during certification terminate, although it may be inconvenient to write all functions in primitive recursive form, as required by Coq. In our framework, we force termination of all function invocations using timeouts. This means that for each successful run (i.e., one that does not time out) there is a bound on the call-stack depth. We use this observation to make all functions primitive recursive on the call-stack depth. When we translate a function definition, we add an explicit argument depth that is checked and decremented at each function call. Fig. 4 shows the result of translating a typical implementation of the subClass function for our running example. The boxed elements are added by the translation. Note that in order to be able to signal a timeout, the return type of the function is an option type. Coq will accept this function because it can check syntactically that it is primitive recursive in the depth argument. This translation preserves any partial correctness property of the code. For example, if we can prove about the specification that any invocation of subClass that yields SOME true implies that two classes are in a subclass relationship, then the same property holds for the original code whenever it terminates with the value true. Handling Imperative Features. The function calculatePreconditions from Fig. 3 uses I/O operations to read and decode the basic block invariants from the .class file (as in the KVM [30] version of Java), or must use an intraprocedural fixed-point computation to deduce the basic block preconditions from the method start precondition (as for standard .class files). In any case, this function most likely uses a significant number of imperative constructs or even external libraries. This example demonstrates a situation when the result of complex computations is used only as a hint, whose exact value is not important for soundness but only for completeness. We believe that this is often the case when writing certifiers, which suggests that a monadic [31] style of translation would unnecessarily complicate the resulting specification. For such situations we propose a cheaper translation scheme that abstracts soundly the result of side-effecting operations. We describe this scheme informally, 180 B.-Y.E. Chang, A. Chlipala, and G.C. Necula fun readu16 ( s: callstate, buff: int array, idx: int) : int = 256 * (freshread1 s) + (freshread2 s) fun readu32 ( s: callstate, buff: int array, idx: int) : int = 65536 * readu16( freshstate3 s, buff,i) + readu16( freshstate4 s, buff,i+2) Fig. 5. Translation of a function for reading a 16-bit and 32-bit big-endian numbers from a class file. Original body of readu16 before translation is 256 ∗ buff[i] + buff[i + 1]. by means of an example of functions that read from a Java .class file 16-bit and 32-bit numbers, respectively, written in big-endian notation, shown in Fig. 5. Each update to mutable state is ignored. Each syntactic occurrence of a mutable-state access is replaced with a fresh abstract function (e.g., freshread1) whose argument is an abstraction of the call-stack state. The call-stack argument is needed to ensure that no relationship can be deduced between recursive invocations of the same syntactic state access. Each function whose body reads mutable state, or calls functions that read mutable state, gets a new parameter s that is the abstraction of the call-stack state. Whenever such a function calls another function that needs a call-stack argument, it uses a fresh transformer (e.g., freshstate3) to produce the new actual state argument. This abstraction is sound in the sense that it ensures that nothing can be proved about results of mutable state accesses, and thus any property that we can prove about this abstraction also holds for the actual implementation. If we did not have the call-stack argument, one could prove that each invocation of the readu16 function produces the same result, and thus all results of the readu32 are multiple of 65,537. This latter example also shows why we cannot use the depth argument as an abstraction of the call-stack state. Note that our use of “state” differs from the well-known “explicit state-passing style” in functional programming, where state is used literally to track all mutable aspects of the execution environment. That translation style requires that each function that updates the state not only take an input state but also produce an output state that must be passed to the next statement. In our translation scheme states are only passed down to callers, and the result type of a function does not change. The cost for the simplicity of this translation is a loss of completeness. We are not interested in preserving all the semantics of input programs. Based on our conjecture that we can refactor programs so that their soundness arguments do not depend on imperative parts, we can get away with a looser translation. In particular, we want to be able to prove properties of the input by proving properties of the translation. We do not need the opposite inclusion to hold. Soundness of the Specification Extraction. We argue here informally the soundness of the specification extraction for mutable state. In our implementation, the soundness of the code that implements the extraction procedure is assumed. We leave for future work the investigation of ways to relax this assumption. First, A Framework for Certified Program Analysis and Its Applications 181 we observe that each syntactic occurrence of a function call has its own unique freshstate transformer. This means that, in an execution trace of the specification, each function call has an actual state argument that is obtained by a unique sequence of applications of freshstate transformers to the initial state. Furthermore, in any such function invocation all the syntactic occurrences of a mutable state read use unique freshread access functions, applied to unique values of the state parameter. This means that in any execution trace of the specification, each state read value is abstracted as a unique combination of freshread and freshstate functions. This, in turn, means that for any actual execution trace of the original program, there is a definition of the freshread and freshstate parameters that yields the same results as the actual reads. Since all the freshread and freshstate transformers are left abstract in the specification, any proof about the specification works with any model for the transformers, and thus applies to any execution trace of the original program. A complete formal proof is found in the companion technical report [9]. 4 Soundness Certification We use the techniques described in the previous section to convert the ML data type abs to a description of the abstract domain A in the logic of the proofassistant. Similarly, we convert the ainv value into a set AI ⊆ A. Finally, we model the transition function astep as an abstract transition relation  ⊆ A × 2A such that a  A whenever astep(a) = Succ A. We will abuse notation slightly and identify sets and lists where convenient. We prove soundness of the abstract transition relation with respect to a concrete transition relation. Let (C, C0 , →) be a transition system for the concrete machine. In particular, C is a domain of states; C0 is the set of allowable initial states; and → is a one-step transition relation. These elements are provided in the proof-assistant logic and are trusted. We build whatever safety policy interests us into → in the usual way; we disallow transitions that would violate the policy, so that errors are modeled as “being stuck.” This is the precise way in which one can specify the trusted safety policy for the certified program verifiers (Sect. 5). To certify the soundness of the program analyzer, the certifier developer needs to provide additionally (in the form of a Coq definition) a soundness relation  ⊆ C × A (written as σ in [13]), such that c  a holds if the abstract state a is a sound abstraction of the concrete state c. To demonstrate  is indeed sound, the author also provides proofs (in Coq) for the following standard, local soundness properties of abstract interpretations and bi-simulations. Property 1 (Initialization). For every c ∈ C0 , there exists a ∈ AI such that c  a. The initialization property assures us that the abstract interpretation includes an appropriate invariant for every possible concrete initial state. Property 2 (Progress). For every c ∈ C and a ∈ A such that c  a, if there exists A ⊆ A such that a  A , then there exists c ∈ C such that c → c . 182 B.-Y.E. Chang, A. Chlipala, and G.C. Necula Progress guarantees that, whenever an abstract state is not stuck, any corresponding concrete states are also not stuck. Property 3 (Preservation). For every c ∈ C and a ∈ A such that c  a, if there exists A ⊆ A such that a  A , then for every c ∈ C such that c → c there exists a ∈ (A ∪ AI ) such that c  a . Preservation guarantees that, for every step made by the concrete machine, the resulting concrete state matches one of the successor states of the abstract machine. Preservation is only required when the abstract machine does not reject the program. This allows the abstract machine to reject some safe programs, if it so desires. It is important to notice that, in order to ensure termination, the astep function (and thus the  relation) only returns those successor abstract states that are not already part of the initial abstract states ainv. To account for this aspect, we use AI in the preservation theorem. Together, these properties imply the global soundness of the certifier that implements this abstract interpretation [12], stated as following: Theorem 1 (Certification soundness). For any concrete state c ∈ C reachable from an initial state in C0 , the concrete machine can make further progress. Also, if c has the same program counter as a state a ∈ AI , then c  a. In the technical report [9], we give an idea how these obligations are met in practice by sketching how the proof goes for the example of the Java bytecode verifier shown in Fig. 3. 5 Applications to Mobile-Code Safety Language-based security mechanisms have gained acceptance for enforcing basic but essential safety properties, such as memory and type safety, for untrusted mobile code. The most widely deployed solution for mobile code safety is bytecode verification, as in the Java Virtual Machine (JVM) [25] or the Microsoft Common Intermediate Language (MS-CIL) [18]. A bytecode verifier uses a form of abstract interpretation to track the types of machine registers, and to enforce memory and type safety. The main limitation of this approach is that we must trust the soundness of the bytecode verifier. In turn, this means that we cannot easily change the verifier and its enforcement mechanism. This effectively forces the clients of a code receiver to use a fixed type system and often even a fixed source language for mobile code. Programs written in other source languages can be compiled into the trusted intermediate language but often in unnatural ways with a loss of expressiveness and performance [4, 19, 7]. A good example is the MS-CIL language, which is expressive enough to be the target of compilers for C#, C and C++. Compilers for C# produce intermediate code that can be verified, while compilers for C and C++ use intermediate language instructions that are always rejected by the built-in bytecode verifier. In this latter case, the code may be accepted if the producer of the code can provide an explicit proof that the code obeys the required safety policy and the code receiver uses proof-carrying code [1, 20, 27]. A Framework for Certified Program Analysis and Its Applications 183 Existing work on proof-carrying code (PCC) attests to its versatility, but often fails to address the essential issue of how the proof objects are obtained. In the Touchstone system [11], proofs are generated by a special theorem prover with detailed knowledge about Java object layout and compilation strategies. The Foundational PCC work [1, 20] eliminates the need to hard-code and trust all such knowledge, but does so at the cost of increasing many times the proof generation burden. Both these systems also incur the cost of transmitting proofs. The Open Verifier project [10] proposes to send with the code not per-program proofs but proof generators to be run at the code receiver end for each incoming program. The generated proofs are then checked by a trusted proof checker, as in a standard PCC setup. Using certified program analyses we can further improve this process. The producer of the mobile code writes a safety-policy verifier customized for the exact compilation strategy and safety reasoning used in the generation of the mobile code. This verifier can be written in the form of a certified program analysis, whose abstract transition fails whenever it cannot verify the safety of an instruction. For example, we discuss in Sect. 6 cases when the program analysis is a typed assembly language checker, a bytecode verifier, or an actual PCC verification engine relying on annotations accompanying the mobile code. The key element is the soundness proof that accompanies an analysis, which can be checked automatically. At verification time, the now-trusted program analyzer is used to validate the code, with no need to manipulate explicit proof objects. This simplifies the writing of the validator (as compared with the proofgenerating theorem prover of Touchstone, or the Open Verifier). We also show in Sect. 6 that this reduces the validation time by more than an order of magnitude. We point out here that the soundness proof is with respect to the trusted concrete semantics. By adding additional safety checks in the concrete semantics (for instance, the logical equivalents of dynamic checks that would enforce a desired safety policy), the code receiver can construct customized safety policies. 6 Case Studies In this section, we present case studies of applying certified program analyzers to mobile code security. We describe experience with verifiers for typed assembly language, Java bytecode, and proof-carrying code. We have developed a prototype implementation of the certified program analysis infrastructure. The concrete language to be analyzed is the Intel x86 assembly language. The specification extractor is built on top of the front-end of the OCaml compiler, and it supports a large fragment of the ML language. The most notable features not supported are the object-oriented features. In addition to the 3000-line extractor, the trusted computing base includes the whole OCaml compiler and the Coq proof checker, neither of which is designed to be foundationally small. However, our focus here has been on exploring the ease of use and run-time efficiency of our approach. We leave minimizing the trusted base for future work. 184 B.-Y.E. Chang, A. Chlipala, and G.C. Necula Typed Assembly Language. Our first realistic use of this framework involved Typed Assembly Language. In particular, we developed and proved correct a verifier for TALx86, as provided in the first release of the TALC tools from Cornell [26]. This TAL includes several interesting features, including continuation, universal, existential, recursive, product, sum, stack, and array types. Our implementation handles all of the features used by the test cases distributed with TALC, with the exception of the modularity features, which we handle by “hand-linking” multiple-file tests into single files. TALC includes compilers to an x86 TAL from Popcorn (a safe C dialect) and mini-Scheme. We used these compilers unchanged in our case study. We implemented a TALx86 verifier in 1500 lines of ML code. This compares favorably with the code size of the TALC type checker, which is about 6000 lines of OCaml. One of us developed our verifier over the course of two months, while simultaneously implementing the certification infrastructure. We expect that it should be possible to construct new verifiers of comparable complexity in a week’s time now that the infrastructure is stable. We also proved the local soundness properties of this implementation in 15,000 lines of Coq definitions and proof scripts. This took about a month, again interleaved with developing the trusted parts of the infrastructure. We re-used some definitions from a previous TAL formalization [10], but we didn’t re-use any proofs. It’s likely that we can significantly reduce the effort required for such proofs by constructing some custom proof tactics based on our experiences. We don’t believe our formalization to be novel in any fundamental way. It uses ideas from previous work on foundational TAL [2, 20, 14]. The main difference is that we prove the same basic theorems about the behavior of an implementation of the type checker, instead of about the properties of inference rules. This makes the proofs slightly more cumbersome, but, as we will see, it brings significant performance improvement. As might be expected, we found and fixed many bugs in the verifier in the course of proving its soundness. This suggests that our infrastructure might be useful even if the developer is only interested in debugging his analysis. Table 1 presents some verification-time per- Table 1. Average verifier running formance results for our implementation, as times (in seconds) average running times for inputs with particular counts of assembly instructions. We ran a Conv CPV PCC number of verifiers on the test cases provided Up to 200 (13) 0 0.01 0.07 with TALC, which used up to about 9000 as201-999 (7) 0.01 0.02 0.24 sembly instructions. First, the type checker 1000 and up (6) 0.04 0.08 1.73 included with TALC finishes within the resolution of our timing technique for all cases, so we don’t include results for it. While this type checker operates on a special typed assembly language, the results we give are all for verifying native assembly programs, with types and macro-instructions used as meta-data. As a result, we can expect that there should be some inherent slow-down, since some TAL instructions must be compiled to multiple real instructions. The experiments were A Framework for Certified Program Analysis and Its Applications 185 performed on an Athlon XP 3000+ with 1 GB of RAM, and times are given in seconds. We give times for “Conventional (Conv),” a thin wrapper around the TALC type checker to make it work on native assembly code; “CPV,” our certified program verifier implementation; and “PCC,” our TALx86 verifier implementation from previous work [10], in which explicit proof objects are checked during verification. The results show that our CPV verifier performs comparably with the conventional verifier, for which no formal correctness proof exists. It appears our CPV verifier is within a small constant factor of the conventional verifier. This constant is likely because we use an inefficient, Lisp-like serialization format for including meta-data in the current implementation. We expect this would be replaced by a much faster binary-encoded system in a more elaborate version. We can also see that the certified verifier performs much better than the PCC version. The difference in performance is due to the cost required to manipulate and check explicit proof objects during verification. To provide evidence that we aren’t comparing against a poorly-constructed straw man, we can look to other FPCC projects. Wu, Appel, and Stump [32] give some performance results for their Prolog-based implementation of trustworthy verifiers. They only present results on input programs of up to 2000 instructions, with a running time of .206 seconds on a 2.2 GHz Pentium IV. This seems on par with our own PCC implementation. While their trusted code base is much smaller than ours, since we require trust in our specification extractor, there is hope that we can achieve a similarly small checking kernel by using techniques related to certifying compilation. Java Bytecode Verification. We have also used our framework to implement a partial Java Bytecode Verifier (JBV) in about 600 lines of ML. It checks most of the properties that full JBVs check, mainly excluding exceptions, object initialization, and subroutines. Our implementation’s structure follows closely that of our running example from Sect. 2. Its ainv begins by calling an OCaml function that calculates a fixed point using standard techniques. Like in our example, the precise code here doesn’t matter, as the purpose of the function is to populate a hash table of function preconditions and control-flow join point invariants. With this information, our astep function implements the standard typing rules for JBVs. While we have extracted complete proof obligations for the implementation, we have only begun the process of proving them. However, to make sure we are on track to an acceptable final product, we have performed some simple benchmarks against the bytecode verifier included with Blackdown Java for Linux. We downloaded a few Java-only projects from SourceForge and ran each verifier on every class in each project. On the largest that our prototype implementation could handle, MegaMek, our verifier finishes in 5.5 seconds for checking 668,000 bytecode instructions, compared to 1 second for the traditional verifier. First, we note that both times are relatively small in an absolute sense. It probably takes a user considerably longer to download a software package than to verify it with either method. 186 B.-Y.E. Chang, A. Chlipala, and G.C. Necula We also see that our verifier is only a small factor away from matching the traditional approach, whose performance we know empirically that users seem willing to accept. No doubt further engineering effort could close this gap or come close to doing so. Proof-Carrying Code. We can even implement a version of Foundational PCC in our framework: for each basic block the mobile code contains an invariant for the start of the block, and a proof that the strongest postcondition of the start invariant along the block implies the invariant for the successor block. The abstract state abs of the certifier consists of a predicate written in a suitable logic, intended to be the strongest postcondition at the given program point. The ainv is obtained by reading invariants from a data segment accompanying the mobile code. Fig. 6 shows a fun checkProof (prf: proof) (p: pred) : bool = ... fragment of the code fun astep (a: abs) : result = for astep, which calcase instrAt a.pc of culates the strongest RegReg(r1, r2) => Succ [{ postcondition for evpc = a.pc + 1; a = And(Eq(r1,r2),Exists(x,[x/r1]a.a)) }] ery instruction. At a | Jump l => jump we fetch the inlet dest = getInvar l in variant for the destilet prf = fetchProof l in nation, a proof, and if checkProof (prf, Imply(a.a, dest)) then then check the proof. Succ [ ] To prove soundness, else Fail we only need to ensure that getInvar Fig. 6. A fragment of a certifier for PCC returns one of the invariants that are part of ainv, and that the checkProof function is sound. More precisely, whenever the call to checkProof returns true, then any concrete state that satisfies a.a also satisfies dest. In particular, we do not care at all how fetchProof works, where it gets the proof from, whether it decrypts or decompresses it first, or whether it actually produces the proof itself. This soundness proof for checkProof is possible and even reasonably straightforward, since we are writing our meta-proofs in Coq’s more expressive logic. 7 Related Work Toward Certified Program Analyses. The Rhodium system developed by Lerner et al. [24] is the most similar with respect to the overall goal of our work—that of providing a realistic framework for certified program analyses. However, they focus on simpler compiler analysis problems whose soundness can be proved by today’s automated methods. We expect that our proofs can similarly be automated when our framework is used for the kinds of analyses expressible in Rhodium-style domain specific languages. Several systems have been developed for specifying program analyses in domain-specific languages and generating code from these specifications [23]. A Framework for Certified Program Analysis and Its Applications 187 Again, the expressiveness of these systems is very limited compared to what is needed for standard mobile code safety problems. In the other direction, we have the well-established body of work dealing with extracting formal verification conditions from programs annotated with specifications. Especially relevant are the Why [16] and Caduceus [17] tools, which produce Coq proof obligations as output. There has been a good amount of work on constructing trustworthy verifiers by extracting their code from constructive proofs of soundness. Cachera et al. [8] extracted a data-flow analysis from a proof based on a general constraint framework. Klein and Nipkow [21] and Bertot [5] have built certified Java bytecode verifiers through program extraction/code generation from programs and proofs in Isabelle and Coq, respectively. None of these publications present any performance figures to suggest that their extracted verifiers scale to real input sizes. Enforcing Mobile-Code Safety. As alluded to earlier, most prior work in Foundational Proof-Carrying Code has focused on the generality and expressivity of various formalisms, including the original FPCC project [2], Syntactic FPCC [20], and Foundational TALT [14]. These projects have given convincing arguments for their expressiveness, but they have not yet demonstrated a scalable implementation. Some recent research has looked into efficiency considerations in FPCC implementations, including work by Wu, Appel, and Stump [32] and our own work on the Open Verifier [10]. The architecture proposed by Wu, Appel, and Stump is fairly similar to the architecture we propose, with the restriction that verifiers must be implemented in Prolog. In essence, while we build in an abstract interpretation engine, Wu et al. build in a Prolog interpreter. We feel that it is important to support verifiers developed in more traditional programming languages. Also, the performance figures provided by Wu et al. have not yet demonstrated scalability. Our past work on the Open Verifier has heavily influenced the design of the certified program analysis architecture. Both approaches build an abstract interpretation engine into the trusted base and allow the uploading of customized verifiers. However, the Open Verifier essentially adheres to a standard PCC architecture in that it still involves proof generation and checking for each mobile program to be verified, and it pays the usual performance price for doing this. 8 Conclusion We have presented a strategy for simplifying the task of proving soundness not just of program analysis algorithms, but also of their implementations. We believe that starting with the implementation and extracting natural proof obligations will allow developers to fine tune non-functional aspects of the code, such as performance or debugging instrumentation. Certified program analyses have immediate applications for developing certified program verifiers, such that even untrusted parties can customize the verification process for untrusted code. We have created a prototype implementation 188 B.-Y.E. Chang, A. Chlipala, and G.C. Necula and used it to demonstrate that the same infrastructure can support in a very natural way proof-carrying code, type checking, or data-flow based verification in the style of bytecode verifiers. Among these, we have completed the soundness proof of a verifier for x86 Typed Assembly Language. The performance of our certified verifier is quite on par with that of a traditional, uncertified TALx86 type checker. We believe our results here provide the first published evidence that a foundational code certification system can scale. References 1. A. W. Appel. Foundational proof-carrying code. In Proc. of the 16th Symposium on Logic in Computer Science, pages 247–258, June 2001. 2. A. W. Appel and A. P. Felty. A semantic model of types and machine instructions for proof-carrying code. In Proc. of the 27th Symposium on Principles of Programming Languages, pages 243–253, Jan. 2000. 3. G. Barthe, P. Courtieu, G. Dufay, and S. de Sousa. Tool-assisted specification and verification of the JavaCard platform. In Proc. of the 9th International Conference on Algebraic Methodology and Software Technology, Sept. 2002. 4. N. Benton, A. Kennedy, and G. Russell. Compiling Standard ML to Java bytecodes. In Proc. of the International Conference on Functional Programming, pages 129– 140, June 1999. 5. Y. Bertot. Formalizing a JVML verifier for initialization in a theorem prover. In Proc. of the 13th International Conference on Computer Aided Verification, volume 2102 of LNCS, pages 14–24, July 2001. 6. B. Blanchet, P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Miné, D. Monniaux, and X. Rival. A static analyzer for large safety-critical software. In Proc. of the Conference on Programming Language Design and Implementation, pages 196–207, 2003. 7. P. Bothner. Kawa — compiling dynamic languages to the Java VM. In Proc. of the FreeNIX Track: USENIX 1998 annual technical conference, 1998. 8. D. Cachera, T. P. Jensen, D. Pichardie, and V. Rusu. Extracting a data flow analyser in constructive logic. In D. A. Schmidt, editor, Proc. of the 13th European Symposium on Programming, volume 2986 of LNCS, pages 385–400, Mar. 2004. 9. B.-Y. E. Chang, A. Chlipala, and G. C. Necula. A framework for certified program analysis and its applications to mobile-code safety. Technical Report UCB ERL M05/32, University of California, Berkeley, 2005. 10. B.-Y. E. Chang, A. Chlipala, G. C. Necula, and R. R. Schneck. The Open Verifier framework for foundational verifiers. In Proc. of the 2nd Workshop on Types in Language Design and Implementation, Jan. 2005. 11. C. Colby, P. Lee, G. C. Necula, F. Blau, M. Plesko, and K. Cline. A certifying compiler for Java. In Proc. of the Conference on Programming Language Design and Implementation, pages 95–107, May 2000. 12. P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proc. of the 4th Symposium on Principles of Programming Languages, pages 234–252, Jan. 1977. 13. P. Cousot and R. Cousot. Abstract interpretation frameworks. J. Log. Comput., 2(4):511–547, 1992. A Framework for Certified Program Analysis and Its Applications 189 14. K. Crary. Toward a foundational typed assembly language. In Proc. of the 30th Symposium on Principles of Programming Languages, pages 198–212, Jan. 2003. 15. E. W. Dijkstra. Guarded commands, nondeterminancy and formal derivation of programs. Communications of the ACM, 18:453–457, 1975. 16. J.-C. Filliâtre. Why: a multi-language multi-prover verification tool. Research Report 1366, LRI, Université Paris Sud, March 2003. 17. J.-C. Filliâtre and C. Marché. Multi-Prover Verification of C Programs. In Proc. of the 6th International Conference on Formal Engineering Methods, volume 3308 of LNCS, pages 15–29, Nov. 2004. 18. A. D. Gordon and D. Syme. Typing a multi-language intermediate code. In Proc. of the 28th Symposium on Principles of Programming Languages, pages 248–260, Jan. 2001. 19. K. J. Gough and D. Corney. Evaluating the Java virtual machine as a target for languages other than Java. In Joint Modula Languages Conference, Sept. 2000. 20. N. A. Hamid, Z. Shao, V. Trifonov, S. Monnier, and Z. Ni. A syntactic approach to foundational proof-carrying code. In Proc. of the 17th Symposium on Logic in Computer Science, pages 89–100, July 2002. 21. G. Klein and T. Nipkow. Verified lightweight bytecode verification. Concurrency – practice and experience, 13(1), 2001. 22. G. Klein and T. Nipkow. Verified bytecode verifiers. Theor. Comput. Sci., 298(3):583–626, 2003. 23. J. H. E. F. Lasseter. Toolkits for the automatic construction of data flow analyzers. Technical Report CIS-TR-04-03, University of Oregon, 2003. 24. S. Lerner, T. Millstein, E. Rice, and C. Chambers. Automated soundness proofs for dataflow analyses and transformations via local rules. In Proc. of the 32nd Symposium on Principles of Programming Languages, pages 364–377, 2005. 25. T. Lindholm and F. Yellin. The Java Virtual Machine Specification. The Java Series. Addison-Wesley, Reading, MA, USA, Jan. 1997. 26. G. Morrisett, K. Crary, N. Glew, D. Grossman, R. Samuels, F. Smith, D. Walker, S. Weirich, and S. Zdancewic. Talc releases, 2003. URL: http://www. cs.cornell.edu/talc/releases.html. 27. G. C. Necula. Proof-carrying code. In Proc. of the 24th Symposium on Principles of Programming Languages, pages 106–119, Jan. 1997. 28. G. C. Necula, R. Jhala, R. Majumdar, T. A. Henzinger, and W. Weimer. Temporalsafety proofs for systems code. In Proc. of the Conference on Computer Aided Verification, Nov. 2002. 29. L. C. Paulson. Isabelle: A generic theorem prover. Lecture Notes in Computer Science, 828, 1994. 30. E. Rose. Lightweight bytecode verification. J. Autom. Reason., 31(3-4):303–334, 2003. 31. P. Wadler. Monads for functional programming. In Advanced Functional Programming, volume 925 of LNCS, pages 24–52. Springer, 1995. 32. D. Wu, A. W. Appel, and A. Stump. Foundational proof checkers with small witnesses. In Proc. of the 5th International Conference on Principles and Practice of Declarative Programming, pages 264–274, Aug. 2003. Improved Algorithm Complexities for Linear Temporal Logic Model Checking of Pushdown Systems Katia Hristova and Yanhong A. Liu Computer Science Department, State University of New York, Stony Brook, NY 11794 katia@cs.sunysb.edu Abstract. This paper presents a novel implementation strategy for linear temporal logic (LTL) model checking of pushdown systems (PDS). The model checking problem is formulated intuitively in terms of evaluation of Datalog rules. We use a systematic and fully automated method to generate a specialized algorithm and data structures directly from the rules. The generated implementation employs an incremental approach that considers one fact at a time and uses a combination of linked and indexed data structures for facts. We provide precise time complexity for the model checking problem; it is computed automatically and directly from the rules. We obtain a more precise and simplified complexity analysis, as well as improved algorithm understanding. 1 Introduction Model checking is a widely used technique for verifying that a property holds for a system. Systems to be verified can be modeled accurately by pushdown systems (PDS). Properties can be modeled by linear temporal logic (LTL) formulas. LTL is a language commonly used to describe properties of systems [12, 13, 21] and is sufficiently powerful to express many practical properties. Examples include many dataflow analysis problems and various correctness and security problems for programs. This paper focuses on LTL model checking of PDS, specifically on the global model checking problem [15]. The model checking problem is formulated in terms of evaluation of a Datalog program [5]. Datalog is a database query language based on the logic programming paradigm [11, 1]. The Büchi PDS, corresponding to the product of the PDS and the automaton representing the inverse of the property, is expressed in Datalog facts, and a reach graph — an abstract representation of the Büchi PDS, is formulated in rules. The method described in [18] generates specialized algorithms and data structures and complexity formulas for the rules. The generated algorithms and data structures are such that   This work was supported in part by NSF under grants CCR-0306399 and CCR0311512 and ONR under grants N00014-04-1-0722 and N00014-02-1-0363. Corresponding author. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 190–206, 2006. c Springer-Verlag Berlin Heidelberg 2006  Improved Algorithm Complexities for LTL Model Checking of PDS 191 given a set of facts, they compute all facts that can be inferred. The generated implementation employs an incremental approach that considers one fact at a time and uses a combination of linked and indexed data structures for facts. The running time is optimal, in the sense that each combination of instantiations of hypotheses is considered once in O(1) time. Our main contributions are: – A novel implementation strategy for the model checking problem that combines an intuitive definition of the model checking problem in rules [5] and a systematic method for deriving efficient algorithms and data structures from the rules[18]. – A precise and automatic time complexity analysis of the model checking problem. The time complexity is calculated directly from the Datalog rules, based on a thorough understanding of the algorithms and data structures generated, reflecting the complexities of implementation back into the rules. We thus develop a model checker with improved time complexity guarantees and improved algorithm understanding. The rest of this paper is organized as follows. Section 2 defines LTL model checking of PDS. Section 3 expresses the model checking problem by use of Datalog rules. Section 4 describes the generation of a specialized algorithm and data structures from the rules and analyzes time complexity of the generated implementation. Section 5 discusses related work and concludes. 2 Linear Temporal Logic Model Checking of Pushdown Systems This section defines the problem of model checking PDS against properties expressed using LTL formulas, as described in [15]. 2.1 Pushdown Systems A pushdown system (PDS) [14] is a triple (CP , SP , TP ), where CP is a set of control locations, SP is a set of stack symbols and TP is a set of transitions. A transition is of the form (c, s) → (c , w) where c and c are control locations, s is a stack symbol, and w is a sequence of stack symbols; it denotes that if the PDS is in control location c and symbol s is on top of the stack, the control location changes to c , s is popped from the stack, and the symbols in w are pushed on the stack, one at a time, from left to right. A configuration of a PDS is a pair (c, w) where c is a control location and w is a sequence of symbols from the top of the stack. If (c, s) → (c , w) ∈ TP then for all v ∈ SP ∗ , configuration (c, sv) is said to be an immediate predecessor of (c , wv). A run of a PDS is a sequence of configurations conf0 , conf1 , ..., confn such that confi is an immediate predecessor of confi+1 , for i = 0, . . . , n − 1. We only consider PDSs where each transition (c, s) → (c , w) satisfies |w| ≤ 2. Any given PDS can be transformed to such a PDS. Any transition (c, s) → 192 K. Hristova and Y.A. Liu (c , w), such that |w| > 2, can be rewritten into (c, s) → (c , whd s ) and (c , s ) → (c, wtl ), where whd is the first symbol in w, wtl is w without its first symbol, and s is a fresh symbol. This step can be repeated until all transitions have |w| ≤ 2. This replaces each transition (c, s) → (c , w), where |w| > 2, with |w| − 1 transitions and introduces |w| − 1 fresh stack symbols. The procedure calls and returns in a program correspond to a PDS [16]. First, we construct a control flow graph (CFG) [2] of the program. Then, we set up one control location, say called c. Each CFG vertex is a stack symbol. Each CFG edge (s, s ) corresponds to a transition (i) (c, s) → (c, ), where  stands for the empty string, if (s, s ) is labeled with a return statement; (ii) (c, s) → (c, s f0 ), if (s, s ) is labeled with a call to procedure f , and f0 is f ’s entry point; (iii) (c, s) → (c, s ), otherwise. A run of the program corresponds to a PDS run. s0 m0 if drand48()<0.5 void s() if (drand48() < 0.5): return; else: plot_up(); m(); plot_down(); m2 if d<0.66 s2 s3 o s4 plot_up return else m3 call_m m7 s5 call s plot_up plot-down m4 m8 plot_right s1 call m m5 return m9 if d<0.33 o main0 srand(...) wn m6 main2 t_ do call m pl o main() srand48(time(NULL)); s(); else d=drand48() else void m() double d = drand48(); if (d < 0.66): s(); plot_right(); if (d < 0.33): m(); else: else: plot_up(); m(); plot_down(); call s m1 main1 return o (a) Example program return o (b) Corresponding CFG Fig. 1. Example program and corresponding CFG Figure 1 shows an example program and its CFG [15]. The program creates random bar graphs using the commands plot up, plot right, and plot down. The corresponding PDS is: CP = {c} SP = {m0, m1, m2, m3, m4, m5, m6, m7, m8, m9, s0, s1, s2, s3, s4, s5, main0, main1, main2} TP = {(c, m3) → (c, m4s0), (c, m6) → (c, m1m0), (c, m8) → (c, m9m0), (c, m1) → (c, ), (c, s2) → (c, ), (c, s4) → (c, s5m0), (c, s1) → (c, ), (c, main2) → (c, main1s0), (c, main1) → (c, )} Improved Algorithm Complexities for LTL Model Checking of PDS 2.2 193 Linear Temporal Logic Formulas Linear temporal logic (LTL) formulas [12, 13, 21] are evaluated over infinite sequences of symbols. The standard logic operators are available; if f and g are formulas, then so are ¬f , f ∧ g, f ∨ g, f → g. The following additional operators are available: X f : f is true in the next state; F f : f is true in some future state; G f : f is true globally, i.e. in all future states; g U f : g is true in all future states until f is true in some future state. A LTL formula can be translated to a Büchi automaton, a finite state automaton over infinite words. The automaton accepts a word if on reading it a good state is entered infinitely many times. Formally, a Büchi automaton (BA) is a tuple (CB , LB , TB , C0B , GB ) where CB is a set of states, LB is a set of transition labels, TB is a set of transitions, C0B ⊆ CB is a set of starting states, and GB ⊆ CB is a set of good states. A transition is of the form (c, l, c ), where c, c ∈ CB and l ∈ LB . The label of a transition is a condition that must be met by the current symbol in the word being read, in order for the transition to be possible. A label denotes an unconditional transition. An accepting run of a Büchi automaton is an infinite sequence of transitions (c0 , l0 , c1 ), (c1 , l1 , c2 ), . . . , (cn−1 , ln−1 , cn ), where a state ci ∈ GB appears infinitely many times. To specify a program property using an LTL formula, the program’s CFG edges are used as atomic propositions. LTL formulas are defined with respect to infinite runs of the program. The corresponding BA accepts an infinite sequence of CFG edges, if on reading it, the automaton enters a good state infinitely many times. For example, the property that plotting up is never immediately followed by plotting down is expressed by the LTL formula F = G(plot up → X(¬plot down)). The BA1 corresponding to ¬F is shown in Figure 2. In the diagram nodes correspond to states and edges correspond to transitions of the BA; double circles mark good states and a square marks the start state. plot_up c1 plot_down c2 _ plot_up _ c3 c4 _ c5 _ Fig. 2. Büchi automaton corresponding to ¬G(plot up → X(¬plot down)) 2.3 LTL Model Checking of PDS Given a system expressed as a PDS P , and a LTL formula F , the formula F holds for P if it holds for every run of P . We check whether F holds for P as follows [15]. First, we construct B — the BA corresponding to ¬F . Second, we 1 The Büchi automaton was generated with the tool LBT that translates LTL formulas to Büchi automata (http://www.tcs.hut.fi/Software/maria/tools/lbt/). 194 K. Hristova and Y.A. Liu construct BP — a Büchi PDS that is the product of P and B, and make sure BP has no accepting run. A Büchi PDS (BPDS) is a tuple (C, S, T, C0 , G), where C is a set of control locations, S is a set of stack symbols, T is a set of transitions, C0 ⊆ C is the set of starting control locations, G ⊆ C is the set of good control locations. Transitions are of the form ((C ∗ S) ∗ (C ∗ S ∗ )). The concepts configuration, predecessor, and run of a BPDS are analogous to those of a PDS. An accepting run of the BPDS is an infinite sequence of configurations in which configurations with control locations in G appear infinitely many times. The product BPDS BP of P = (CP , SP , TP ) and B = (CB , LB , TB , C0B , GB ) is the five-tuple ((CP ∗ CB ), SBP , TBP , C0BP , GBP ), where (((cP , cB ), s), ((cP , cB ), w)) ∈ TBP if (cP , s) → (cP , w)∈ TP , and there exists f such that (cB , f, cB )∈ TB , and f is true at configuration ((cP , cB ), s); (cP , cB ) ∈C0BP if cB ∈ C0B ; (cP , cB ) ∈GBP if cB ∈ GB . Next we construct a reach graph — a finite graph that abstracts BP . The nodes of the graph are configurations of BP . An edge ((c, s), (c , s )) in the reach graph corresponds to a run that takes BP from configuration (c, s) to configuration (c , s ). If a good control location in BP is visited in the run corresponding to an edge, the edge is said to be good. A path in the reach graph is a sequence of edges. Cycles in the reach graph correspond to infinite runs of BP . Paths containing cycles with good edges in them correspond to accepting runs of BP and are said to be good. If the reach graph corresponding to BP has no good paths, BP has no accepting runs and F holds for P . Otherwise, the good paths in the reach graph are counterexamples showing that F does not hold for P . 3 Specifying the Reach Graph in Rules and Detecting Good Paths This section expresses the reach graph using Datalog rules and employs an algorithm for detecting good paths in the reach graph as presented in [5]. A Datalog program is a finite set of relational rules of the form p1 (x11 , ..., x1a1 ) ∧ ... ∧ ph (xh1 , ..., xhah ) → q(x1 , ..., xa ) where h is a natural number, each pi (respectively q) is a relation of ai (respectively a) arguments, each xij and xk is either a constant or a variable, and variables in xk ’s must be a subset of the variables in xij ’s. If h = 0, then there are no pi ’s or xij ’s, and xk ’s must be constants, in which case q(x1 , ..., xa ) is called a fact. The meaning of a set of rules and a set of facts is the smallest set of facts that contains all the given facts and all the facts that can be inferred, directly or indirectly, using the rules. Expressing the Büchi PDS. The BPDS is expressed by the relations loc, trans0, trans1, and trans2. The loc relation represents the control locations of the BPDS; its arguments are a control location and a boolean argument indicating whether the control location is good. One instance of the relation Improved Algorithm Complexities for LTL Model Checking of PDS 195 exists for each control location. The three relations trans0, trans1, and trans2 express transitions. The facts trans0(c1,s1,c2), trans1(c1,s1,c2,s2), and trans2(c1,s1,c2,s2,s3), where ci’s are control locations and si’s are stack symbols, denote transitions of the form of the form ((c, s), (c, w)) such that, ∗ and |w| = 0, |w| = 1, and |w| = 2, respectively. or is a relation with w ∈ SBP three boolean arguments; in the fact or(x1,x2,r), the argument r is the value of the logical or of the arguments x1 and x2. Expressing the edges of the reach graph. The reach graph is expressed by relations erase and edge. The fact erase(c1,s1,g,c2) denotes a run of BP from configuration (c1, s1) to configuration (c2, ). The third element in the tuple is a boolean value that indicates whether the corresponding run goes through a good control location. The edge relation represents the reach graph edges. edge(c1,s1,g,c2,s2) denotes an edge between nodes (c1, s1) and (c2, s2); g is a boolean argument indicating whether the edge is good. For a BPDS (CBP , SBP , TBP , C0BP , GBP ), erase and edge are the relation satisfying: i. (c1, s, g, c2) ∈erase if (c1, s) → (c2, ) ∈ TBP , and g = true if c1 ∈ GBP and f alse otherwise ii. (c1, s1, g1 ∨ g2, c3) ∈erase if (c1, s1) → (c2, s2) ∈ TBP , and (c2, s2, g2, c3) ∈ erase, and g1 = true if c1 ∈ GBP and f alse otherwise iii. (c1, s1, g1 ∨ g2 ∨ g3, c4) ∈erase if (c1, s1) → (c3, s2s3) ∈ TBP , (c2, s2, g2, c3) ∈erase, and (c3, s3, g3, c4) ∈erase, and g1 = true if c1 ∈ GBP and f alse otherwise and i. (c1, s1, g, c2, s2) ∈edge if (c1, s1) → (c2, s2) ∈ TBP , and g = true if c1 ∈ GBP and f alse otherwise ii. (c1, s1, g, c2, s2) ∈edge if (c1, s1) → (c2, s2s3) ∈ TBP , g = true if c1 ∈ GBP and f alse otherwise iii. (c1, s1, g1 ∨ g2, c3, s3) ∈edge if (c1, s1) → (c2, s2s3) ∈ TBP , (c2, s2, g2, c3) ∈ erase, and g = true if c1 ∈ GBP and f alse otherwise In model checking of programs, the relation erase summarizes the effects of procedures. The three parts of the above definition correspond to the program execution exiting, proceeding within, or entering a procedure. The definitions of the erase and edge relations can be readily written as rules. These rules are shown in Figure 3. Detecting good paths. Checking that the BPDS accepts the empty language amounts to checking that the resulting reach graph has no good paths. To find good paths in the reach graph we use the algorithm presented in [5,-Figure 4]but ignore consideration of resource labels by the algorithm. The algorithm uses depth first search and is linear in the number of edges in the reach graph. 196 K. Hristova and Y.A. Liu trans0(c1,s1,c2)∧loc(c1,g)→erase(c1,s1,g,c2) trans1(c1,s1,c2,s2)∧erase(c2,s2,g2,c3)∧loc(c1,g1)∧or(g1,g2,g) →erase(c1,s1,g,c3) trans2(c1,s1,c2,s2,s3)∧erase(c2,s2,g2,c3)∧erase(c3,s3,g3,c4)∧ loc(c1,g1)∧or(g1,g2,g4)∧or(g4,g3,g)→erase(c1,s1,g,c4) trans1(c1,s1,c2,s2)∧loc(c1,g)→edge(c1,s1,g,c2,s2) trans2(c1,s1,c2,s2,s3)∧loc(c1,g)→edge(c1,s1,g,c2,s2) trans2(c1,s1,c2,s2,s3)∧erase(c2,s2,g2,c3)∧loc(c1,g1)∧or(g1,g2,g) →edge(c1,s1,g,c3,s3) Fig. 3. Rules corresponding to the erase relation used to construct the reach graph, and the edge relation of the reach graph 4 Efficient Algorithm for Computing the Reach Graph This section describes the generation of a specialized algorithm and datastructures for computing the reach graph from the rules shown in the previous section, as well as analyzing precisely the time complexity for computing the reach graph and expressing the complexity in terms of characterizations of the facts—the parameters characterizing the BPDS. 4.1 Generation of Efficient Algorithms and Data Structures Transforming the set of rules into an efficient implementation uses the method in [18]. We first transform each rule with more than two hypotheses into multiple rules with two hypotheses each and then carry out three key steps. Step 1 transforms the least fixed point (LFP) specification of the rule set to a while-loop. Step 2 transforms expensive set operations in the loop into incremental operations. Step 3 designs appropriate data structures for each set, so that operations on it can be implemented efficiently. These three steps correspond to dominated convergence [10], finite differencing [20], and real-time simulation [19], respectively, as studied by Paige et al. Auxiliary relations. For each rule with more than two hypotheses, we transform it to multiple rules with two hypotheses each. The transformation introduces auxiliary relations with necessary arguments to combine two hypotheses at a time. We repeatedly apply the following transformations to each rule with more than two hypotheses until only rules with at most two hypotheses are left. We replace any two hypotheses of the rule, say Pi (Xi1 , ..., Xiai ) and Pj (Xj1 , ..., Xjaj ) by a new hypothesis, Q(X1 , ..., Xa ), where Q is a fresh relation, and Xk ’s are variables in the arguments of Pi or Pj that occur also in the arguments of other hypotheses or the conclusion of this rule. We add a new rule: Pi (Xi1 , ..., Xiai ) ∧ Pj (Xj1 , ..., Xjaj ) → Q(X1 , ..., Xa ). The resulting rule set for constructing the reach graph is shown in Figure 4. Several auxiliary relations have been introduced. The relations gtrans1 and Improved Algorithm Complexities for LTL Model Checking of PDS 197 gtrans2 represent transitions like trans1 and trans2 respectively, but an extra argument indicates whether the transitions start at a good control location. The relations gtrans1e and gtrans2e, represent runs of the BPDS starting with a transition trans1 and trans2 respectively, followed by a run represented as a fact of the erase relation. The facts gtrans1e(c1,s1,c2,g1,g2) and gtrans2e(c1,s1,s2,c2,g1,g2) represent runs from configuration (c1, s1) to configurations (c2, ) and (c2, s2) respectively, where g1 and g2 indicate, respectively, whether the first control location in the run is good and whether the rest of the run visits a good control location. The relation gtrans2ee represents runs consisting of one transition and two runs expressed as facts of the erase relation. The fact gtrans2ee(c1,s1,c2,g1,g2,g3) stands for a run from configuration (c1, s1) to configuration (c2, ); the arguments g1, g2, and g3 are booleans indicating respectively, whether the first control location in the run is good, and whether the remaining two parts of the run visit a good control location. The relations gtrans1ee or and gtrans2ee or represents runs like gtrans1ee and gtrans2ee, except with two boolean arguments combined using logical or. 1. 2. 3. 4. 5. 6. loc(c1,g)∧trans0(c1,s1,c2)→erase(c1,s1,g,c2) loc(c1,g1)∧trans1(c1,s1,c2,s2)→gtrans1(c1,g1,s1,c2,s2) gtrans1(c1,g1,s1,c2,s2)∧erase(c2,s2,g2,c3)→gtrans1e(c1,s1,c3,g1,g2) gtrans1e(c1,s1,c3,g1,g2)∧or(g1,g2,g)→erase(c1,s1,g,c3) loc(c1,g1)∧trans2(c1,s1,c2,s2,s3)→gtrans2(c1,g1,s1,c2,s2,s3) gtrans2(c1,g1,s1,c2,s2,s3)∧erase(c2,s2,g2,c3) →gtrans2e(c1,s1,s2,c3,g1,g2) 7. gtrans2e(c1,s1,s2,c3,g1,g2)∧erase(c3,s2,g3,c4) →gtrans2ee(c1,s1,c4,g1,g2,g3) 8. gtrans2ee(c1,s1,c4,g1,g2,g3)∧or(g1,g2,g4) →gtrans2ee or(c1,s1,c4,g3,g4) 9. gtrans2ee or(c1,s1,c4,g3,g4)∧or(g4,g3,g)→ erase(c1,s1,g,c4) 10. gtrans1(c1,g,s1,c2,s2)→ edge(c1,s1,g,c2,s2) 11. gtrans2(c1,g,s1,c2,s2,s3)→ edge(c1,s1,g,c2,s2) 12. gtrans2e(c1,s1,s2,c2,g1,g2)∧or(g1,g2,g)→edge(c1,s1,g,c2,s2) Fig. 4. The reach graph expressed in rules with at most two hypotheses Fixed-point specification and while-loop. We represent a relation the form Q(a1, a2, ... , an) using tuples of the form [Q,a1,a2,...,an]. We use S with X and S less X to mean S ∪ {X} and S − {X}, respectively. We use the notation {X : Y1 in S1 , . . . , Yn in Sn |Z} for set comprehension. Each Yi enumerates elements of Si ; for each combination of Y1 , . . . , Yn if the value of boolean expression Z is true, then the value of expression X forms an element of the resulting set. If Z is omitted, it is implicitly the constant true. LFP(S0 , F ) denotes the minimum element S, with respect to the subset ordering ⊆, that satisfies the condition S0 ⊆ S and F (S) = S. We use standard control constructs while, for, if, and case, and we use indentation to indicate scope. We abbreviate X := XopY as Xop:= Y . 198 K. Hristova and Y.A. Liu We use set bpds for the set of facts representing the BPDS. rbpds = {[loc,c1,g] : loc(c1,g) in bpds} ∪ {[trans0,c1,s1,c2] : trans0(c1,s1,c2) in bpds} ∪ {[trans1,c1,s1,c2,s2] : trans1(c1,s1,c2,s2) in bpds} ∪ {[trans2,c1,s1,c2,s2,s3] : trans0(c1,s1,c2,s2,s3) in bpds }, Given any set of facts R, and a rule with rule number n and with relation e in the conclusion, let ne(R), referred to as result set, be the set of all facts that can be inferred by rule n given the facts in R. For example, 2gtrans1 = {[gtrans c1 s1 g c2 s2] : [loc c1 g] in R and [trans1 c1 g s1 c2 s2] in R}, 10edge = {[edge c1 s1 g c2 s2] : [gtrans1 c1 g s1 c2 s2] in R}. The meaning of the give facts and the rules used to compute the reach graph is: LFP({},F), where F(R) = rbpds ∪ 1erase(R) ∪ 2gtrans1(R) ∪ 3gtrans1e(R) ∪ 4erase(R) ∪ 5gtrans2(R) ∪ 6gtrans2e(R) ∪ 7gtrans2ee(R) ∪ 8gtrans2ee or(R) ∪ 9erase(R) ∪ 10edge(R) ∪ 11edge(R) ∪ 12edge(R). This least-fixed point specification of computing the reach graph is transformed into the following while-loop: R := {}; while exists x in F(R) - R R with := x; (1) The idea behind this transformation is to perform small update operations in each iteration of the while-loop. Incremental computation. Next we transform expensive set operations in the loop into incremental operations. The idea is to replace each expensive expression exp in the loop with a variable, say E, and maintain the invariant E = exp, by inserting appropriate initializations and updates to E where variables in exp are initialized and updated, respectively. The expensive expressions in constructing the reach graph are all result sets, such as 2grtrans1(R), and F(R)-R. We use fresh variables to hold each of their respective values and maintain the following invariants: Ibpds = rbpds, I1erase = 1erase(R), I2gtrans1 = 2gtrans1(R), I3gtrans1e = 3gtrans1e(R), I4erase = 4erase(R), I5gtrans2 = 5gtrans2(R), I6gtrans2e = 6gtrans2e(R), I7gtrans2ee = 7gtrans2ee(R), I8gtrans2ee or = 8gtrans2ee or(R), I9erase = 9erase(R), I10edge = 10edge(R), I11edge = 11edge(R), I12edge = 12edge(R), W = F(R) - R. W serves as the workset. As an example of incremental maintenance of the value of an expensive expression, consider maintaining the invariant I2gtrans1. Improved Algorithm Complexities for LTL Model Checking of PDS 199 I2gtrans1 is the value of the set formed by joining elements from the set of facts of the loc and trans1 relations. I2gtrans1 can be initialized to {} with the initialization R = {}. To update Igtrans1 incrementally with update R with:= x, if x is of the form [loc,c1,g] we consider all matching tuples of the form [trans1,c1,s1,c2,s2] and add the tuple [gtrans1,c1,g,s1,c2,s2] to I2gtrans1. To form the tuples to add, we need to efficiently find the appropriate values of variables that occur in [trans1,c1,s1,c1,s2] tuples, but not in [loc,c1,g], i.e. the values of s1,c2, and s2, so we maintain an auxiliary map that maps [c1] to [s1,c2,s2] in the variable I2gtrans1 trans1 shown below. Symmetrically, if x is a tuple of [trans1,c1,s1,c2,s2], we need to consider every matching tuple of [loc,c1,g] and add the corresponding tuple of [gtrans1,c1,g,s1,c2,s2] to I2gtrans1 loc. The first set of elements in auxiliar maps is referred to as the anchor and the second set of elements as the nonanchor. I2gtrans1 trans1 = {[[c1], [s1,c2,s2]] : [trans1,c1,s1,c2,s2] in R}, I2gtrans1 loc = {[[c1], [g]] : [loc,c1,g] in R}. Thus, we are able to directly find only matching tuples and consider only combinations of facts that make both hypotheses true simultaneously, as well as consider each combination only once. Similarly, such auxiliary maps are maintained for all invariants that we maintain. All variables holding the values of expensive computations listed above and auxiliary maps are initialized together with the assignment R := {} and updated incrementally together with the assignment R with:= x in each iteration. When R is {}, Ibpds = rbpds, all auxiliary maps are initialized to {}, and W = Ibpds. When a fact is added to R in the loop body, the variables are updated. We show the update for the addition of a fact of relation trans1 only for I2gtrans1 invariant and I2gtrans1 loc auxiliary map , since other facts and updates to the variables and auxiliary maps are processed in the same way. The notation E{Ys}, where E = {[Ys,Xs]} is an auxiliary map, is used to access all matching tuples of E and return all matching values of Xs. case of x of [loc,c1,g]: I2gtrans1 +:= {[gtrans1,c1,g,s1,c2,s2]: [s1,c2,s2] in I2gtrans1 trans1{c1}}; W +:= {[gtrans1,c1,g,s1,c2,s2]: [s1,c2,s2] in I2gtrans1 trans1{c1} |[gtrans1,c1,g,s1,c2,s2] notin R}; I2gtrans1 loc with:= {[[c1], [g]] : [loc,c1,g] in R}; (2) Using the above initializations and updates, and replacing all invariant maintenance expressions with W, we obtain the following complete code: initialization; R:={}; while exits x in W: update; W less:= x; R with:= x; (3) 200 K. Hristova and Y.A. Liu We next eliminate dead code and clean up the code to contain only uniform operations and set elements for data structure design. We then decompose R and W into several sets, each corresponding to a single relation that occurs in the rules. R is decomposed to Rtrans0, Rtrans1, Rtrans2, Rloc, Rerase, Rgtrans1, Rgtrans1e, Rgtrans2, Rgtrans2e, Rgtrans2ee, Rgtrans2ee or, and Redge. W is decomposed in the same way. We eliminate relation names from the first component of tuples and transform the while-clause and case-clause appropriately. Then, we do the following three sets of transformations. We transform operations on sets into loops that use operations on set elements. Each addition of a set is transformed to a for-loop that adds the elements one at a time. For example, I2gtrans1 +:= {[gtrans1,c1,g,s1,c2,s2]: [s1,c2,s2] in I2gtras1 trans1{c1}} is transformed into: for [s1,c2,s2] in I2gtras1 trans1{c1}: I2gtrans1 +:= [c1,g,s1,c2,s2]; We replace tuples and tuple operations with maps and map operations. We make all element addition and deletion easy by testing membership first. Data structures. After the above transformations each firing of a rule takes a constant number of set operations. Since each of these set operations takes worst case constant time in the generated code, achieved as described below, each firing of a rule takes worst case constant time. Next we describe how to guarantee that each set operation takes worst-case constant time. The operations are of the following kinds: set initialization S := {}, computing image set M (X), element retrieval for X in S and while exists X in S, membership test X in S, X notin S, and element addition S with X and deletion S less X. We use associative access to refer to membership test and computing image set. A uniform method is used to represent all sets and maps, using arrays for sets that have associative access, linked lists for sets that are traversed by loops and both arrays and linked lists when both operations are needed. The result sets, such as Rtrans0, are represented by nested array structures. Each of the result sets of, say, a components is represented using an a-level nested array structure. The first level is an array indexed by values in the domain of the first component of the result set; the k-th element of the array is null if there is no tuple of the result set whose first component has value k, and otherwise is true if a=1, and otherwise is recursively an (a-1)-level nested array structure for remaining components of tuples of result sets whose first component has value k. The worksets, such as Wtrans0, are represented by arrays and linked lists. Each workset is represented the same as the corresponding resultset with two additions. First, for each array we add a linked list linking indices of non-null elements of the array. Second, to each linked list we add a tail pointer. One or more records are used to put each array, linked list, and tail pointer together. Each workset is represented simply as a nested queue structure (without the underlying arrays), one level for each workset, linking the elements (which correspond to indices of the arrays) directly. Improved Algorithm Complexities for LTL Model Checking of PDS 201 Auxiliary maps, such as I2gtrans1 trans1 and I2gtrans1 loc, are implemented as follows. Each auxiliary map, say E for a relation that appears in a rule’s conclusion uses a nested array structure as resultsets and worksets do and additionally linked lists for each component of the non-anchor as worksets do. E uses a nested array structure only for the anchor, where elements of the arrays of the last component of the anchor are each a nested linked-list structure for the non-anchor. 4.2 Complexity Analysis of the Model Checking Problem We analyze the time complexity of the model checking problem by carefully bounding the number of facts actually used by the rules. For each rule we determine precisely the number of facts processed by it, avoiding approximations that use the sizes of individual argument domains. Calculating time complexity. We first define the size parameters used to characterize relations and analyze complexity. For a realtion r we refer to the number of facts of r that are given or can be inferred as r’s size. The parameters #trans0, #trans1 and #trans2 denote the number of transitions of the form ((c1, s1), (c2, )), ((c1, s1), (c2, s2)), and ((c1, s1), (c2, s2s3)), respectively; #trans denotes the total number of transitions. The parameters #gtrans1 and #gtrans2 denote the number of facts of relations gtrans1 and gtrans2, where #gtrans1=#trans1 and #gtrans2=#trans2. Parameters #gtrans1e and #gtrans2e denote the relation sizes — #trans1 ∗ #target loc trans0, and #trans2 ∗ #target loc trans0, respectively, and #gtrans2ee denotes the corresponding relation size equal to #trans2 ∗ #target loc trans02. The parameter #erase denotes the number of facts in the erase relation; #erase.4/123 denotes the number of different values the forth argument of erase can take for each combination of values of the first three arguments. In the worst case, this is the number of control locations c2 such that a transition of the form ((c1, s1), (c2, )) exists in the automaton. We use #target loc trans0 to denote this number. The time complexity for the set of rules is the total number of combinations of hypotheses considered in evaluating the rules. For each rule r, r.#firedTimes stands for the number the number of firings for the rule is a count of: (i) for rules with one hypothesis: the number of facts which make the hypothesis true; (ii) for rules with two hypotheses: the number of combinations of facts which make the two hypotheses simultaneously true. The total time complexity is time for reading the input, i.e. O(#trans + #loc), plus the time for applying each rule, shown in the second column in the table of Figure 5. Time complexity of model checking PDS. Time complexity for processing each of the rules and computing the erase and edge relations is shown in the second table of Figure 5. After the reach graph has been computed, good cycles in the reach graph can be detected in time linear in the size of the reach graph, i.e. O(#edge). Thus, the asymptotic complexity of the model checking problem is dominated by the time complexity of computing the erase relation. 202 K. Hristova and Y.A. Liu rule no time complexity 1 min(#trans0*1,#loc*#trans0.23/1) 2 min(#loc*#trans1.234/1,#trans1*1) 3 min(#gtrans1*#erase.4/123, #erase*#gtrans1.12/34) 4 min(#gtrans1e*1, 1*#gtrans1e) 5 min(#loc*#trans2.2345/1,#trans2*1) 6 min(#gtrans2*#erase.4/123, #erase*#gtrans2.12/345) 7 min(#gtrans2e*#erase.4/123, #erase*#gtrans2e.12/345) 8 min(#gtrans2ee*1,1*#gtrans2ee) 9 min(#gtrans2ee or*1,1*#gtrans2ee or) 10 min(#gtrans2ee or*1,1*#gtrans2ee or) 11 #gtrans1 12 #gtrans2 13 min(#gtrans2e*1,1*#gtrans2e) time complexity bound #trans0 #trans1 #trans1*#target loc trans0 #trans1*#target loc trans0 #trans2 #trans2*#target loc trans0 #trans2*#target loc trans02 #trans2*#target #trans2*#target #trans2*#target #trans1 #trans2 #trans2*#target loc trans02 loc trans02 loc trans02 loc trans0 relation time complexity erase O(#trans0 + #trans1*#target loc trans0 + #trans2*#target loc trans02 ) edge O(#trans1 + #trans2*#target loc trans0) Fig. 5. Time complexity of computing the reach graph For a BPDS, product of P = {CP , SP , TP } where |CP | = 1, and B = {CB , LB , TB , C0B , GB }, #target loc trans0≤|CB |, and #trans2≤|TP | ∗ |TB |. For such a PDS, O(|TP | ∗ |TB | ∗ |CB |2 ) is the worst case time complexity of computing the erase relation and O(|TP | ∗ |TB | ∗ |CB |) is the worst case time complexity for computing the edge relation. Since only |TP | is dependent on the size of P, time complexity is linear in the size of the P and cubic in the size of B. 4.3 Performance We tested the performance of our reach graph construction algorithm on two sets of BPDS consisting of BPDS with increasing #trans. BPDS in one set also had increasing #target loc trans0, while BPDS in the second set had constant #target loc trans0. The time complexity for computing reach graphs for BPDS in the first set is as shown in Figure 5. However, for automata in the second set time complexity should be linear — O(#trans). If the PDS corresponds to a program, #target loc trans0 is proportional to the total number of return points of procedures in the program. Thus, our test data corresponds to checking if a property holds on programs with an increasing number of statements and procedure calls, and programs with an number of statements, but constant number of procedures. Improved Algorithm Complexities for LTL Model Checking of PDS 203 50 Increasing #trans and increasing #traget_loc_trans0 Increasing #trans but constant #traget_loc_trans0 45 40 CPU time in seconds 35 30 25 20 15 10 5 0 0 500 1000 1500 2000 2500 3000 number of transitions 3500 4000 4500 5000 Fig. 6. Results for computing the reach graph for the BPDS Results of the experiment are shown in Figure 6 and confirm our analysis. We used generated python code in which each operation on set elements is guaranteed to be constant time on average using default hashing in python. Running times are measured in seconds on a 500MHz Sun Blade 100 with 256 Megabytes of RAM, running SunOS 5.8. Running times are the average over ten runs. 5 Discussion The problem of LTL model checking of PDS has been extensively researched, especially model checking PDS induced by CFGs of programs. The model checking problem for context-free and pushdown processes is explored in [8]. The design and implementation of Bebop: a symbolic model checker for boolean programs, is presented in [4]. Burkart and Steffen [9] present a model checking algorithm for modal mu-calculus formulas. For a PDS with one control state, a modal-mu calculus formula of alternation depth k can be checked in time O(nk ), where n is the size of the PDS. The works [17, 16, 15, 7] describe efficient algorithms for model checking PDSs. Alur et al. [3] and Benedikt et al. [6] show that state machines can be used to model control flow of sequential programs. Both works describe algorithms for model checking PDS that have time complexity cubic in size of the BA and linear in size of the PDS; these works combine forward and backward reachability and obtain complexity estimations by exploiting this mixture. Esparza et al. [15] estimate time complexity of solving the model checking problem to be O(n*m3) for model checking PDS with one state only, where n is the size of the PDS and m is the size of the property BA [15]. While this is also linear in the size of the PDS, our time complexity analysis is more precise and automatic. The algorithm derived in this work is essentially the same as the one in [15]. What distinguishes our work is that we use a novel implementation strategy 204 K. Hristova and Y.A. Liu for the model checking problem that combines an intuitive definition of the model checking problem in rules [5] and a systematic method for deriving efficient algorithms and data structures from the rules [18], and arrives at an improved complexity analysis. The time complexity is calculated directly from the Datalog rules, based on a thorough understanding of the algorithms and data structures generated, reflecting the complexities of implementation back into the rules. An implementation of the model checking problem in logical rules is presented in [5]. The rules are evaluated using the XSB system [23]. Thus, the efficiency of the computation is highly dependent on the order of hypotheses in the given rules. Our implementation is drastically different, as it finds the best order of hypotheses in the rules automatically. We do not employ an evaluation strategy for Datalog, but generate a specialized algorithm and implementation directly from the rules. In this paper, we presented an efficient algorithm for LTL model checking of PDS. We showed the effectiveness of our approach by using a precise time complexity analysis, along with experiments. These results show that our model checking algorithm can help accommodate larger PDS and properties. Our work is potentially a contribution not only to the model checking problem, since the idea behind the erase relation and the reach graph is more universal than model checking PDS. Variants of the erase relation are used in data flow analysis techniques, as described in [22] and related work. Applications of model checking in dataflow analysis are presented in [25, 24]. It is a topic of future research to apply our method to dataflow analysis problems. Acknowledgment. Thanks to Tom Rothamel for helping debug performance problems in the implementation. References 1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. 2. A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques and Tools. Addison-Wesley, 1986. 3. R. Alur, K. Etessami, and M. Yannakakis. Analysis of recursive state machines. In CAV ’01: Proceedings of the 13th International Conference on Computer Aided Verification, pages 207–220, London, UK, 2001. Springer-Verlag. 4. T. Ball and S. K. Rajamani. Bebop: A symbolic model checker for boolean programs. In SPIN, pages 113–130, 2000. 5. S. Basu, K. N. Kumar, L. R. Pokorny, and C. R. Ramakrishnan. Resourceconstrained model checking of recursive programs. In TACAS ’02: Proceedings of the 8th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 236–250, London, UK, 2002. Springer-Verlag. 6. M. Benedikt, P. Godefroid, and T. W. Reps. Model checking of unrestricted hierarchical state machines. In ICALP ’01: Proceedings of the 28th International Colloquium on Automata, Languages and Programming,, pages 652–666, London, UK, 2001. Springer-Verlag. Improved Algorithm Complexities for LTL Model Checking of PDS 205 7. A. Bouajjani, J. Esparza, and O. Maler. Reachability analysis of pushdown automata: Application to model-checking. In International Conference on Concurrency Theory, pages 135–150, 1997. 8. O. Burkart, D. Caucal, F. Moller, and B. Steffen. Verification on infinite structures. North Holland, 2000. 9. O. Burkart and B. Steffen. Model checking the full modal mu-calculus for infinite sequential processes. In ICALP ’97: Proceedings of the 24th International Colloquium on Automata, Languages and Programming, pages 419–429, London, UK, 1997. Springer-Verlag. 10. J. Cai and R. Paige. Program derivation by fixed point computation. Science of Computer Programming, 11(3):197–261, 1989. 11. S. Ceri, G. Gottlob, and L. Tanca. Logic programming and databases. SpringerVerlag New York, Inc., New York, NY, USA, 1990. 12. E. M. Clarke and E. A. Emerson. Design and synthesis of synchronization skeletons using branching-time temporal logic. In Logic of Programs, Workshop, pages 52–71, London, UK, 1982. Springer-Verlag. 13. E. M. Clarke, E. A. Emerson, and A. P. Sistla. Automatic verification of finite state concurrent system using temporal logic specifications: a practical approach. In POPL ’83: Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, pages 117–126, New York, NY, USA, 1983. ACM Press. 14. J. Edmund M. Clark, O. Grumberg, and D. A. Peled. Model Checking. MIT Press, 1999. 15. J. Esparza, D. Hansel, P. Rossmanith, and S. Schwoon. Efficient algorithms for model checking pushdown systems. In CAV ’00: Proceedings of the 12th International Conference on Computer Aided Verification, pages 232–247, London, UK, 2000. Springer-Verlag. 16. J. Esparza and S. Schwoon. A bdd-based model checker for recursive programs. In CAV ’01: Proceedings of the 13th International Conference on Computer Aided Verification, pages 324–336, London, UK, 2001. Springer-Verlag. 17. A. Finkel, B. Willems, and P. Wolper. A direct symbolic approach to model checking pushdown systems. In Proc. 2nd Int. Workshop on Verification of Infinite State Systems (INFINITY’97), volume 9 of Electronic Notes in Theoretic Comp. Sci. Elsevier, 1997. 18. Y. A. Liu and S. D. Stoller. From datalog rules to efficient programs with time and space guarantees. In Proceedings of the 5th ACM SIGPLAN international conference on Principles and practice of declaritive programming, pages 172–183. ACM Press, 2003. 19. R. Paige. Real-time simulation of a set machine on a ram, 1989. 20. R. Paige and S. Koenig. Finite differencing of computable expressions. ACM Trans. Program. Lang. Syst., 4(3):402–454, 1982. 21. J.-P. Queille and J. Sifakis. Specification and verification of concurrent systems in cesar. In Proceedings of the 5th Colloquium on International Symposium on Programming, pages 337–351, London, UK, 1982. Springer-Verlag. 22. T. Reps, S. Horwitz, and M. Sagiv. Precise interprocedural dataflow analysis via graph reachability. In Conference Record of POPL ’95: 22nd ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, pages 49–61, San Francisco, California, 1995. 206 K. Hristova and Y.A. Liu 23. K. Sagonas, T. Swift, and D. S. Warren. XSB as an efficient deductive database engine. In R. T. Snodgrass and M. Winslett, editors, Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data SIGMOD’94, pages 442–453, 1994. 24. B. Steffen. Generating data flow analysis algorithms from modal specifications. In TACS’91: Selected papers of the conference on Theoretical aspects of computer software, pages 115–139, Amsterdam, The Netherlands, 1993. Elsevier Science Publishers B. V. 25. B. Steffen, A. Classen, M. Klein, J. Knoop, and T. Margaria. The fixpoint-analysis machine. In CONCUR ’95: Proceedings of the 6th International Conference on Concurrency Theory, pages 72–87, London, UK, 1995. Springer-Verlag. A Logic and Decision Procedure for Predicate Abstraction of Heap-Manipulating Programs Jesse Bingham and Zvonimir Rakamarić Department of Computer Science, University of British Columbia, Canada jesse.d.bingham@intel.com zrakamar@cs.ubc.ca Abstract. An important and ubiquitous class of programs are heap-manipulating programs (HMP), which manipulate unbounded linked data structures by following pointers and updating links. Predicate abstraction has proved to be an invaluable technique in the field of software model checking; this technique relies on an efficient decision procedure for the underlying logic. The expression and proof of many interesting HMP safety properties require transitive closure predicates; such predicates express that some node can be reached from another node by following a sequence of (zero or more) links in the data structure. Unfortunately, adding support for transitive closure often yields undecidability, so one must be careful in defining such a logic. Our primary contributions are the definition of a simple transitive closure logic for use in predicate abstraction of HMPs, and a decision procedure for this logic. Through several experimental examples, we demonstrate that our logic is expressive enough to prove interesting properties with predicate abstraction, and that our decision procedure provides us with both a time and space advantage over previous approaches. 1 Introduction In recent years software model checking has emerged as a vibrant area of formal verification research. Much of the success of applying model checking to software has come from the use of predicate abstraction on the program source [16, 14, 3, 18]. In predicate abstraction, sets of states of the program and program transitions are over-approximated using a finite set of predicates over the program variables. These predicates (or boolean combinations thereof) typically express features of the program under verification such as its conditionals and relevant propositions about its variables. An integral ingredient in predicate abstraction is a decision procedure for the logic of the predicates. Since most approaches involve many queries to this decision procedure, performance is paramount. An important class of programs are those we call heap-manipulating programs (HMPs), which are programs that access and modify linked data structures consisting of an unbounded number of uniform heap nodes. HMPs access the heap nodes through a finite number of pointers (that we call node variables) and following pointer fields between nodes. To apply predicate abstraction to HMPs and assert many interesting correctness properties, one must be able to express the concept of unbounded reachability (a.k.a. transitive closure) between nodes. This is done through a binary operator  The authors were supported by grants from the Natural Sciences and Research Council of Canada (NSERC). Jesse Bingham has moved to Intel Corporation, Hillsboro, Oregon, U.S.A. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 207–221, 2006. c Springer-Verlag Berlin Heidelberg 2006  208 J. Bingham and Z. Rakamarić that takes two node terms x and y, and asserts that the second can be reached from the first by following zero or more links; in our syntax this is written as f ∗ (x, y) ( f is the name of the link function). For example, f ∗ ( f (x), x) expresses that x is a node in a circular linked list. Several papers have previously identified the importance of transitive closure for HMPs [30, 31, 5, 19, 2, 23]. Unfortunately, adding support for transitive closure to even relatively tame logics often yields undecidability [19]. Our first contribution is a fragment of the decidable logics that we show (through several nontrivial experiments) is still expressive enough to verify properties of interest for HMPs using predicate abstraction. Decidability of our logic follows from a small model theorem, akin to that of Benedikt et al. [5] and Balaban et al. [2], which states that if a set of predicates is satisfiable, then it is satisfiable by a heap structure with some bounded number of nodes. A naive decision procedure can thus enumerate all the (super-factorial but finite) number of structures of size up to this bound. We do not formally state or prove a small model theorem in this paper, rather, our second and most important contribution is an efficient decision procedure for our logic. We show that this procedure, though a worst case exponential time algorithm, solves the vast majority of queries sent to it during predicate abstraction very quickly. The result is an approach that can have large time and memory savings over decision procedures that enumerate all models, even when BDDs are used for this enumeration, as done by Balaban et al. [2]. The paper is organized as follows. Sect. 2 summarizes other work on verification of HMPs. Predicate abstraction and our verification framework (based on previous work), is outlined in Sect. 3. HMPs are introduced in Sect. 4. Sects. 5 and 6 respectively define our transitive closure logic and the decision procedure. We present experimental results in Sect. 7. Sect. 8 draws conclusions and discusses several important extensions to our logic and decision procedure that we believe are possible, but have been left as future work. Please note that our technical report [6] provides proofs of the theorems, additional details regarding the decision procedure, pseudocode for the example programs, and the sets of predicates needed for their verification. 2 Related Work Balaban et al. [2] present an approach for shape analysis based on predicate abstraction that is similar to ours. The logic they use for describing properties of heap structures has slightly richer expressiveness than the logic we define in this paper.1 The major difference between the two approaches is the way a program abstraction is computed. To compute the abstraction, they employ a small model theorem, and build BDDs representing all models up to the small model size. This is a bottleneck in both computation time and memory, since these BDDs tend to blow-up. The technique of Kesten and Pnueli [21] for establishing termination employed by Balaban et al. is likely compatible with our work also. McPeak and Necula [28] specify heap data structures using local equality axioms, which constrain only a bounded fragment of the heap around some node. This enables them to describe a variety of shapes and reason about scalar values without abstracting 1 Whereas our logic is unquantified, they allow restricted universal quantification. A Logic and Decision Procedure for Predicate Abstraction of HMPs 209 them, while still preserving decidability. However, they can only approximate reachability between nodes (though unreachability is precise). When pointer disequalities are added, their decision procedure becomes incomplete. We handle both reachability and disequalities, but we can’t describe such a variety of shapes. In addition, we compute an inductive invariant of a program automatically (given an appropriate set of predicates), while they require a user to provide loop invariants, which can be a significant burden. The Pointer Assertion Logic Engine (PALE) [29] specifies heap structures using graph types [22], which are tree-shaped data structures augmented with extra pointers. The authors show that many common heap structures can be defined that way, some of which we cannot express. PALE relies on a decision procedure with non-elementary complexity, so, there are programs that cannot be verified in practice. Furthermore, loop invariants must be provided by the user. The Three Valued Logic Analyzer (TVLA) [32, 25] extends conventional abstract interpretation with a third “uncertain” logic value, and builds so-called 3-valued logical structures that abstract the reachable states at each program point (a.k.a. canonical abstraction). The abstract semantics of program statements are defined by abstract transformers, which can be generated by TVLA or user-defined if necessary. We cannot handle all heap structures that TVLA can, however, the abstract invariant we compute is always the most precise w.r.t. the given set of predicates. TVLA does not make such a guarantee, although some work has been done to make TLVA more precise [33]. TVLA is also employed by Manevich et al. [27], who observe that the number of shared nodes in linked lists is bounded and present a novel definition of “uninterrupted list segments”. This is used to define predicate and canonical abstractions of potentially circular singly linked lists, and enables them to verify some HMPs that we are not able to verify, though their properties tend to be simpler than ours (see Sect. 7). Lahiri and Qadeer [23] define two new predicates to express reachability of heap nodes in linked lists. To prove properties of HMPs, they use first-order axioms over those predicates. The given set of axioms is incomplete, and they provide an induction principle that is used to derive additional axioms when necessary. Because of the purely first-order axiomatization, they are able to harness the power of available automated theorem provers; they use UCLID [8] as the underlying inference engine. Dams and Namjoshi [11] propose an approach based on predicate abstraction and model checking. They abstract a program by iteratively calculating weakest preconditions of shape predicates, and are able to handle second-order shape properties such as reachability, cyclicity, and sharing. The algorithm doesn’t use a decision procedure, and as a consequence, new predicates can be generated in every iteration. Hence, the algorithm often has to be manually provided with “approximation hints” to converge. 3 Verification Approach 3.1 Predicate Abstraction Our approach to verifying heap programs is based on predicate abstraction [16], which is an instance of abstract interpretation [10]. In the framework of abstract interpretation, a concrete system (in our case an HMP) is verified by constructing a finite-state overapproximation of the concrete system called the abstract system. Let C (the concrete 210 J. Bingham and Z. Rakamarić states) be the set of states of the concrete system. Predicate abstraction employs a finite set of predicates φ1 , . . . , φk in some logic that are assertions about concrete states. Corresponding to the predicates respectively are the abstract boolean variables b1 , . . . , bk . The set of abstract states A will be the set of assignments to the abstract boolean variables. The abstraction function α : C → A is defined such that α (c)(bi ) = true if and only if c  φi . A set of concrete states C is then abstracted by α (C) = {α (c) | c ∈ C} Note that since A is finite, α (C) is always finite as well. In contrast, C is often infinite; in our case the infinitude of concrete states arises from the unboundedness of the heap in HMPs. Let R ⊆ C be the set of concrete states that are reachable in the concrete system. We wish to verify that a property expressed as a state assertion ψ over the concrete states holds for all members of R, i.e. that the implication R → ψ holds. Predicate abstraction is used to solve this problem by computing a set Rα ⊆ A such that α (R) ⊆ Rα . Verification succeeds if one can prove that Rα → ψ . A key difference in the various approaches to predicate abstraction is how Rα is computed [16, 14, 12, 15, 2, 11]. This typically involves numerous queries to a decision procedure for the underlying logic and there are tradeoffs between how accurately Rα approximates α (R) and the number and complexity of these queries. Rα is usually computed as a fixpoint of some approximation of the abstract post image operator post : 2A → 2A , defined as follows. Given a set of abstract states A, let   post(A) = α (c ) | ∃c, c ∈ C .(c, c ) ∈ T ∧ α (c) ∈ A where T is the transition relation of the concrete system. post(A) is thus the set of abstract states representing concrete states that are concrete successors of those states represented by A. Since predicate abstraction is an incomplete approach, if it fails to verify the property, this can either happen because the concrete systems actually violates the property, or because of the loss of information inherent in the abstraction. Finding the “right” set of predicates for verification to go through can be tricky business. Many works have addressed this issue of predicate discovery [13, 4, 18, 11], which falls under the more general umbrella of abstraction refinement [9]. As in recent papers on this topic [2, 23], in our current framework, predicates are added by manual inspection of counterexample behaviors; applying automatic predicate discovery techniques is an important area of future work. 3.2 Computing post Our tool computes post precisely; the algorithm can be viewed as an improvement over the following naive algorithm. Since post distributes over disjunction,2 computing post(A) is reducible to computing post(ρ ) for each cube ρ in some disjunctive normal form decomposition of A. Here, cube means a partial boolean assignment to 2 Meaning that post(A1 ∨ A2 ) = post(A1 ) ∨ post(A2 ). A Logic and Decision Procedure for Predicate Abstraction of HMPs 211 the abstract variables, and represents all abstract states that agree on this subset of the abstract variables.3 By using a BDD [7] to represent A, we can easily obtain such a decomposition. The naive algorithm cycles through all 2k abstract states a, and checks if a ∈ post(ρ ); post(ρ ) is then the BDD representing the disjunction of all such a. Each check of a ∈ post(ρ ) involves a call to the decision procedure to determine if the following formula is satisfiable: γ (ρ ) ∧ wp(γ (a)) (1) where γ is the concretization function, and wp is the weakest precondition operator [17]. Intuitively, γ maps a cube to a logic formula that denotes the set of concrete states represented by the cube. Formally, for a cube µ let P(µ ) (resp. N(µ )) denote the set {i | µ (bi ) = true} (resp. {i | µ (bi ) = false}). Then define γ (µ ) =  i∈P( µ ) φi ∧  ¬φi i∈N( µ ) The weakest precondition operator wp is a syntactic transformation on logic formulas that depends on the program statement under consideration [17]. For example, for an assignment statement x := e, where x is a variable and e is some expression, wp(π ) is constructed by syntactically replacing all occurrences of x with e in the formula π .4 Our approach applies wp at the granularity of individual program statements when performing predicate abstraction. Das et al.’s computation of post that we employ uses several straightforward optimizations over this naive algorithm [14]. First, if (1) contains a syntactic contradiction, meaning the existence of a predicate and its negation, then clearly the formula is not satisfiable. In such circumstances there is no need to call the decision procedure. When computing post(ρ ), our implementation initially computes a BDD C representing the set of all a that won’t yield such a contradiction. Second, rather than enumerating all a ∈ C, we do recursive case-splitting on the abstract variables, which allows for pruning of large portions of C. For example, let µ be the cube that assigns true to b1 and leaves all other variables unconstrained. Then if γ (ρ ) ∧ wp(γ (µ )) is unsatisfiable, then so too is γ (ρ ) ∧ wp(γ (a)) for any abstract state a that has b1 equal to true. Hence, our algorithm would only explore those abstract states having b1 false. 4 Heap-Manipulating Programs In our framework, the heap consists of an unbounded number of nodes. HMPs allow for node variables (pointers), data fields for nodes, a link field f for nodes, and all other variables are modelled (or encoded as) booleans. In lieu of a formal presentation of HMPs, we give an example called ND-I NSERT in Fig. 1 that captures most of the interesting features. This program takes a node head and a node item, and inserts item into the linked list pointed to by head at a position 3 4 A partial boolean assignment maps each variable bi to an element of {true, false, undef}. This only works under the assumption that x cannot be aliased. 212 J. Bingham and Z. Rakamarić 1: procedure ND-I NSERT (head, item) 2: assume ¬ f ∗ (head, item) ∧ f ∗ (head, nil) ∧ ¬head = nil ∧ f (item) = nil ∧ p = head 3: while true do 4: if ND ∨ f (p) = nil then 5: f (item) := f (p); 6: f (p) := item; 7: break 8: else 9: p := f (p); 10: end if 11: end while 12: assert f ∗ (head, item) ∧ f ∗ (head, nil) 13: end procedure Fig. 1. A program that nondeterministically inserts a node item into the list pointed to by head. Here ND is a boolean value that is nondeterministically true or false. selected nondeterministically. head is assumed to be non-nil and to point to an acyclic linked list that does not contain item. These assumptions are formalized by the assume statement on line 2 of the program. In the assume statement, and also in the assert statement, the subformulas of the form f ∗ (x, y) express that node y is reachable from node x by following a sequence of f links of any length; we will formally define these predicates in Sect. 5. The fact that nil is reachable from head enforces the acyclicality assumption.5 The body of ND-I NSERT is straightforward; a pointer p walks the list, and item is inserted at some point. The loop breaks once the insertion has occurred. The expression ND represents a nondeterministic boolean value. item is inserted when either ND = true, or the end of the list is reached (detected by the disjunct f (p) = nil on line 4). The specification is expressed by the assert statement on line 12, and indicates that whenever line 12 is reached, head must point to an acyclic list that contains item. The verification problem we wish to solve can be stated as follows: given an HMP, determine whether it is the case that all executions that satisfy all assume statements also satisfy all assert statements. Since the number of nodes in the heap is unbounded, HMPs are generally infinite state, thus one cannot directly apply finite-state model checking to this problem without using abstraction. 5 A Simple Transitive Closure Logic Our logic assumes finite sets of node variables V , boolean variables B, data function variables D, and a single link function symbol f . The term, atom, and literal syntactic entities are given in Fig. 2. Literals of the form x = y, ¬x = y, f ∗ (x, y), and ¬ f ∗ (x, y) (where x and y are terms) are called equality, inequality, reachability, and unreachability literals, respectively. Literals of the form d(x) or ¬d(x), where d ∈ D, are called data literals, while those of the form b or ¬b are called simply boolean variable literals. 5 In our logical framework, nil is modelled simply as a node having f (nil) = nil. A Logic and Decision Procedure for Predicate Abstraction of HMPs 213 v ∈ V d ∈ D b ∈ B term ::= v | f (term) atom ::= f ∗ (term, term) | term = term | d(term) | b literal ::= atom | ¬atom Fig. 2. The syntax of our simple transitive closure logic The structures over which the semantics of our logic is defined are called heap structures. A heap structure H = (N, Θ ) involves a finite set of nodes N and a function Θ that interprets each symbol σ in V ∪ B ∪ D ∪ { f } such that Θ (σ ) ∈ N Θ (σ ) ∈ {true, false} Θ (σ ) ∈ N → {true, false} Θ (σ ) ∈ N → N if σ if σ if σ if σ ∈V ∈B ∈D =f Thus Θ interprets each node variable as a node, each boolean variable as a boolean value, each data function variable as a function that maps nodes to booleans, and the link function f is interpreted as a mapping from nodes to nodes. Heap structures naturally model a linked data structure of nodes, each node having a single pointer to another node and some finite number of boolean-valued fields. The size of H is defined to be |N|. The variables of V model program variables that point to nodes in the data structure, while the variables of B model program variables of boolean type. Clearly, program variables or node fields of any finite enumerated type can be encoded using the booleans accommodated by our logic. We extend Θ to Θ e , which interprets any term or atom in the obvious way, formally defined here. The interpretation of a term τ is defined inductively by:  Θ (τ ) if τ ∈ V e Θ (τ ) = Θ ( f )(Θ e (τ  )) if τ has the form f (τ  ) for some term τ  Θ e interprets atoms as boolean values. An equality atom τ1 = τ2 is interpreted as true by Θ e iff Θ e (τ1 )=Θ e (τ2 ). A data atom is interpreted by defining Θ e (d(τ ))=Θ (d)(Θ e (τ )). A reachability atom f ∗ (τ1 , τ2 ) is interpreted as true iff there exists some n ≥ 0 such that Θ ( f )n (Θ e (τ1 )) = Θ e (τ2 ).6 Finally, a literal that is not an atom is of the form ¬φ where φ is an atom, and we simply define Θ e (¬φ ) = ¬Θ e (φ ). Sticking to the usual notation, given a heap structure H = (N, Θ ) and a literal φ , we write H  φ iff Θ e (φ ) = true. For a set of literals Φ , we write H  Φ iff H  φ for all φ ∈ Φ. 6 Decision Procedure The decision problem we aim to solve with our decision procedure is this: given a finite set of literals Φ , does there exist a heap structure H such that H  Φ ? If there is such an 6 Here, function exponentiation represents iterative application: for a function g and an element x in its domain, g0 (x) = x, and gn (x) = g(gn−1 (x)) for all n ≥ 1. 214 J. Bingham and Z. Rakamarić v=v I DENT f ∗ (v, v) f ∗ (x, y) f ∗ (y, z) T RANS 2 f ∗ (x, z) f (x1 ) = x2 y = x1 f ∗ (x, y) x=y f (x2 ) = x3 y = x2 f ∗ (x, z) F UNC f ∗ (y, z) f (x) = y x=z ··· f (xk ) = x1 f ∗ (x1 , y) C YCLEk ··· y = xk f ∗ (y, x) f ∗ (x, z) S CC f ∗ (z, x) f (x) = z f (x) = y T RANS 1 f ∗ (x, y) R EFLEX f (y) = z f ∗ (x, y) x=y f ∗ (x, y) f ∗ (y, z) f ∗ (y, x) f ∗ (x, z) T OTAL f ∗ (z, y) S HARE Fig. 3. The set of inference rules. Here x, y, and z range over (not necessarily distinct) terms. In the rules I DENT and R EFLEX, v is restricted to be a variable that is already mentioned; this restriction prevents either of these rules from introducing new terms. C YCLEk actually defines a separate rule for each k ≥ 1. H, then we say that Φ is satisfiable, otherwise Φ is unsatisfiable. Clearly, any algorithm for this problem can be used to decide the satisfiability of a conjunction-of-literals (1) by simply taking Φ to be the set of its conjuncts. Decidability of this problem follows from a small model theorem enjoyed by our logic, akin to other transitive closure logics [2, 5]. Our small model theorem states that Φ is satisfiable if and only if there exists H of size at most n such that H  Φ , where n is the number of distinct terms mentioned in Φ . Hence, a decision procedure can simply enumerate the finite set of such H, and for each one check if H  Φ . However, since the number of such heap structures is at least nn , this approach is impractical. Employing BDDs [7] to represent the set of heap structures that satisfy Φ [2] is also memory-intensive; building a BDD for the literal f ∗ (x, y) over just 8 nodes cannot be done in 2 GB of memory. This stems from the fact that such a BDD must represent the multitude of different paths that could exist between the nodes Θ e (x) and Θ e (y). Our approach has relatively small memory requirements, and is based on a set of inference rules (IRs) with the property that Φ is satisfiable if and only if their exhaustive application does not introduce a contradiction. Here contradiction means the inference of both an atom φ and its negation ¬φ . The IRs are presented in Fig. 3. For an IR r, the antecedents of r are the literals appearing above the line, while the consequents are those appearing below the line. We say that an IR r is applicable (to Φ ) if there are terms appearing in Φ such that when these terms are substituted for the term placeholders of r (i.e. x, y, z, x1 , etc.), all of r’s antecedents appear in Φ , and none of r’s consequents appear in Φ . We now explain each IR of Fig. 3. I DENT states that each node variable is equal to itself, while R EFLEX enforces that any node variable is reachable from itself. T RANS 1 states that the transitive closure f ∗ must extend the function f . T RANS 2 simply enforces that f ∗ is transitive. F UNC asserts that if f (x) = y and z is reachable from x, then z must also be reachable from y, unless x = z. If there is a cycle of length k ≥ 1 in f , then it A Logic and Decision Procedure for Predicate Abstraction of HMPs 215 follows that any node y reachable from a node on the cycle must be on the cycle as well; this is formalized by C YCLEk . Similar to F UNC is S CC, which states that if x and y are distinct and mutually reachable from each other, and z is reachable from x, then x is reachable from z (since x must lie on a cycle of f ). T OTAL requires that if y and z are both reachable from another node x, then there must exist some reachability relationship between y and z. The fact that in a cycle of f , no two distinct nodes x and y can have f (x)= f (y) is captured by S HARE. Given the preceding intuition, it is easy to prove the following. Theorem 1. The inference rules of Fig. 3 are sound. Theorem 1 tells us that if iterative application of the IRs yields a contradiction, then we can conclude that the original set of literals is unsatisfiable. Conversely, we have proven our IRs to be complete with respect to sets of literals in a certain normal form, and Theorem 2 below states that it is sufficient to restrict attention to such sets. Let Vars(Φ ) denote the subset of the node variables V appearing in Φ . Definition 1 (normal). A set of literals Φ is said to be normal if 1. For each vi ∈ Vars(Φ ), there exists (a) at most one equality literal of the form f (vi ) = v j , where v j ∈ Vars(Φ ), and (b) the literal vi = vi . All equality literals of Φ are required to be of one of the forms (a) or (b). 2. All inequality literals are of the form ¬vi = v j , where vi , v j ∈ Vars(Φ ). 3. All reachability literals are of the form f ∗ (vi , v j ), where vi , v j ∈ Vars(Φ ). 4. All unreachability literals are of the form ¬ f ∗ (vi , v j ), where vi , v j ∈ Vars(Φ ). 5. There exist no data or boolean variable literals in Φ . Theorem 2. There exists a polynomial-time algorithm that transforms any set Φ into a normal set Φ  such that Φ  is satisfiable if and only if Φ is satisfiable. Thanks to Theorem 2, our decision procedure can without loss of generality assume that Φ is normal. Let us call a set of literals Φ consistent if it does not contain a contradiction, and call Φ closed if none of the IRs of Fig. 3 are applicable. The following completeness result is the crux of our decision procedure. Theorem 3. If Φ is consistent, closed, and normal, then Φ is satisfiable. The proof of Theorem 3 is quite technical, and involves reasoning about the dependencies between digraphs of partial functions and the digraphs of their transitive closures. For details, please see our technical report [6]. Viewed from a high level, our decision procedure first applies the transformation of Theorem 2, and then repeatedly searches for an applicable IR, applies it (i.e. adds a consequent to the set), and recurses. The recursion is necessary for those IRs that branch, i.e. have multiple consequents. If the procedure ever infers a contradiction, it backtracks to the last branching IR with an unexplored consequent, or returns unsatisfiable if there is no such IR. If the procedure reaches a point where there are no applicable IRs and no contradictions, then the inferred set of literals is consistent, closed, and normal. Hence, by Theorem 3, it may correctly return satisfiable. Our technical report [6] provides a more formal presentation of the decision procedure. We note that our decision procedure is guaranteed to terminate because none of the IRs introduce new terms. 216 J. Bingham and Z. Rakamarić 6.1 An Extension In order to handle program assignments that mutate the links in the heap, i.e. modify f , we must extend our logic and decision procedure to support simultaneous reference to f and f  , which respectively model the link function before and after the assignment. Such an assignment has the general form f (τ1 ) := τ2 , where τ1 and τ2 are arbitrary terms. Lines 5 and 6 of the HMP of Fig. 1 are examples of such assignments. The semantic relationship between f and f  can be expressed using the well-known update operator:7 Θ e ( f  ) = update(Θ e ( f ), Θ e (τ1 ), Θ e (τ2 )) (2) Rather than support update as an interpreted second order function symbol in the logic, we add inference rules that implicitly enforce the constraint (2). For each of the eight IRs of Fig. 3 that mention f , we add an analogous IR with f replaced with f  ; these enforce analogous constraints between f ∗ , f  , and = as are enforced by the unmodified IRs of Fig. 3 between f ∗ , f , and =. Furthermore, to enforce the constraint (2), the seven IRs of Fig. 4 are also included. The IRs introduce a fresh variable w that is forced to be equal to f (τ1 ). This allows us to state that Θ e ( f ) = update(Θ e ( f  ), Θ e (τ1 ), Θ e (w)), and hence the symmetry between the IRs U PD F UNC 1 and U PD F UNC 2, between U P D T RANS 1 and U PD T RANS 2, and between U PD T RANS 3 and U PD T RANS 4. Note that these IRs can introduce new terms, however, given a normal set of literals, the number of new terms is bounded. This implies that the extended decision procedure always terminates. f  (τ1 ) = τ2 U PDATE f (τ1 ) = w x = τ1 y=w f ∗ (x, τ f (x) = y f  (x) = y U PD F UNC 1 f ∗ (x, y) 1) f ∗ (w, y) f ∗ (x, τ1 ) f ∗ (x, y) f ∗ (x, y) U PD T RANS 1 f ∗ (x, y) U PD T RANS 3 f ∗ (τ1 , y) x = τ1 y = τ2 f ∗ (x, τ f ∗ (τ f  (x) = y f (x) = y U PD F UNC 2 f ∗ (x, y) 1) f ∗ (x, y) U PD T RANS 2 2 , y) f ∗ (x, τ1 ) f ∗ (x, y) f ∗ (x, y) U PD T RANS 4 f ∗ (τ1 , y) Fig. 4. The update inference rules, which are used to extend our logic to support a second function symbol f  , with the implicit constraint f  = update( f , τ1 , τ2 ), where τ1 and τ2 are fixed but arbitrary terms, and w is a fresh variable used to capture f (τ1 ). Note that the rule U PDATE can introduce literals that violate normalcy (Def. 1) in the case that τ1 or τ2 are not variables. However, this can be remedied by the addition of a new variable and equality literal for each sub-term of τ1 and τ2 . 7 If g is a function, a is an element in g’s domain, and b is an element in g’s codomain, then update(g, a, b) is defined to be the function λ x.(if x = a then b else g(x)). A Logic and Decision Procedure for Predicate Abstraction of HMPs 217 Theorem 4. The inference rules of Fig. 4 are sound. The proof of this theorem is provided in our technical report [6]. We have yet to flesh out the details of a proof of a conjecture analogous to Theorem 3 stating that this extended set of IRs is complete. However, we have empirical support for this conjecture: in conducting our experiments of Sect. 7, we never found any property violations caused by the extended decision procedure erroneously concluding that a set of literals was satisfiable. Of course, not having such a theorem does not compromise the soundness of verification by predicate abstraction. 7 Experiments We have tested our tool on a number of HMP examples and summarized the results in Table 1. We ran the experiments on a Pentium 4 2.6 GHz machine. The safety properties we checked (when applicable) at the end of the HMP are: – no leaks (NL) – all nodes reachable from the head of the list at the beginning of the program are also reachable at the end of the program. – insertion (IN) – a distinguished node that is to be inserted into a list is actually reachable from the head of the list, i.e. the insertion “worked”. – acyclic (AC) – the final list is acyclic, i.e. nil is reachable from the head of the list. – cyclic (CY) – list is a singly linked circular list, i.e. the head of the list is reachable from its successor. – sorted (SO) – list is a sorted linked list, i.e. each node’s data field is less than or equal to its successor’s. – remove elements (RE) – for examples that remove node(s), this states that the node(s) was (were) actually removed. For the program R EMOVE -E LEMENTS, RE also asserts that the data field of all removed elements is false. Often, the properties one is interested in verifying for HMPs involve universal quantification over the heap nodes. For example, to assert the property NL, we must express that for all nodes t, if t is reachable from head initially, then t is also reachable from head (or some other node) at the end of the program. Since our logic doesn’t support quantification, we use the trick of introducing a Skolem constant t [15, 2] to represent a universally quantified variable. Here, t is a new node variable that is initially assumed to satisfy the antecedent of our property, and is otherwise unmodified by the program. For the example program of Fig. 1, we can express NL by conjoining ¬t = nil ∧ f ∗ (head,t) to the assume statement on line 2, and conjoining f ∗ (head,t) to the assertion on line 12. Since t can be any non-nil node reachable from head, if the assertion is never violated, we have proven NL. Our example programs are the following: L IST-R EVERSE – a classical HMP example that performs in-place reversal of a linked list. L IST-A DD – a linked list is traversed, and the end of the list is reached. Then, a node is added to the end of the list. ND-I NSERT – pseudocode for this example is given in Fig. 1. 218 J. Bingham and Z. Rakamarić Table 1. Results of verifying HMPs. “property” specifies the verified property; “CFG edges” denotes the number of edges in the control-flow graph of the program; “preds” is the number of predicates required for verification; “time” is the average execution time over five runs to prove the properties; “DP calls” is the number of decision procedure queries. The largest memory usage for all these examples was 125 MB. program property CFG edges preds time (sec) DP calls L IST-R EVERSE NL 6 8 1.1 173 L IST-A DD NL ∧ AC ∧ IN 7 8 0.8 66 ND-I NSERT NL ∧ AC ∧ IN 5 13 7.9 258 ND-R EMOVE NL ∧ AC ∧ RE 5 12 19.3 377 Z IP NL ∧ AC 20 22 280.7 9185 S ORTED -Z IP NL ∧ SO ∧ IN 28 22 249.9 13760 S ORTED -I NSERT NL ∧ AC ∧ SO 10 20 217.6 8476 B UBBLE -S ORT NL ∧ AC 21 24 204.6 5930 B UBBLE -S ORT NL ∧ AC ∧ SO 21 27 441.0 6324 R EMOVE -E LEMENTS NL ∧ CY ∧ RE 15 17 1263.7 26721 ND-R EMOVE – similar to ND-I NSERT, except that instead of inserting a node, a node is nondeterministically chosen and removed from the list. Z IP – zips two linked lists, shuffling the elements of both list into one. Then, the tail of the longer list is appended to the resulting list. This example is taken from a paper by Jensen et al. [20]. S ORTED -Z IP – joins the elements of two sorted lists into one, also sorted. Here the data elements are simply booleans, so “sorted” means that all nodes with false fields come before nodes with true fields. S ORTED -I NSERT – inserts a node into a sorted linked list so that sortedness is preserved. This is a modification of the example from a technical report by Lahiri and Qadeer [23].8 B UBBLE -S ORT – The bubble sort example sorts elements of a linked list using the bubble sort algorithm. It is taken from a paper by Balaban et al. [2]. The data fields on which we sort are again booleans. R EMOVE -E LEMENTS – removes from a circular list all elements whose data field is false. Our technical report [6] provides pseudocode and lists the required predicates for these examples. As Table 1 shows, we were successful in verifying interesting properties of many examples in reasonable amounts of time. Of special note is our verification of sortedness for B UBBLE -S ORT. This example is from Balaban et al. [2]; because of the BDD blowup inherent in their decision procedure, their tool spaced out for the small model bound necessary for sound verification [1]. In contrast, our trading of space for time appears to be quite advantageous here. 8 To simplify things, they require that the input list starts with a dummy element whose data field value has to be less than all possible values of that data field. We don’t have such requirements in our example, which makes it slightly more complicated. A Logic and Decision Procedure for Predicate Abstraction of HMPs 219 Running time of TVLA on the bubble sort example is somewhat faster than ours, although they are using a slower machine [24]. The recent experimental results of Manevich et al. [27] are significantly faster, in spite of the fact they were executed on a slower machine. For most of their examples, however, they only verify the simple property of no null dereferences (they also verify cyclicity for two examples). We are verifying more complicated properties, for instance SO. Very recently, Loginov et al. [26] have used TVLA to fully automatically verify the bubblesort example. For the two examples in common with Lahiri and Qadeer [23],9 L IST-R EVERSE and S ORTED -I NSERT, we are significantly faster at verifying the same properties, with respective speed-ups of 75× and 6×. It should be noted, however, that we used a slightly faster machine, and also that for S ORTED -I NSERT, our data fields are merely booleans, while theirs are the full integers. 8 Future Work and Conclusions Despite the fact that this work is in its early stages, our experiments demonstrate its effectiveness for verification of heap-manipulating programs. There are many directions for future research, which are outlined here. We have identified the following issues related to the expressiveness of the simple transitive closure logic presented in this paper: – This paper only supports a single link function f , yet clearly many heapmanipulating programs involve multiple link fields. – We have found that even minimal support for universally quantified variables (as in the logic of Balaban et al. [2]) would allow expression of common heap structure attributes. For example, the current logic cannot assert that two terms x and y point to disjoint linked lists; a single universally quantified variable would allow for this property (see Nelson [30–page 22]). We found that capturing disjointedness is necessary for verifying that L IST-R EVERSE always produces an acyclic list; hence we were unable to verify this property. – Another situation that cannot be characterized relates to term ordering in circularly linked lists. Suppose x, y, and z are terms in such a list; we would like to express that y does or does not “come between” x and z in the list. Nelson [31] and Manevich et al. [27] have previously recognized the importance of such properties. We believe that our decision procedure can be enhanced to handle each of these three cases. A final expressiveness deficiency, that we see no immediate solution to, is the expression of more involved heap structure properties, in particular trees. Though our logic cannot capture “x points to a tree”, we believe that it is possible that an extension could be used to verify simple properties of programs that manipulate trees, for example that there are no memory leaks. We also plan on investigating how existing techniques for predicate discovery and more advanced predicate abstraction algorithms mesh with our decision procedure. Our approach appears to be very promising, despite the fact that we have yet to harness the recent innovations in these areas. 9 We were unable to run our tool on four of Lahiri and Qadeer’s [23] examples because we have yet to implement support for data field mutations. 220 J. Bingham and Z. Rakamarić Acknowledgement We acknowledge our mutual supervisor Alan J. Hu for his support during this project and Ittai Balaban, Shuvendu Lahiri, and Shaz Qadeer for answering our questions; we also thank Shaz for suggesting this research problem to us. References 1. I. Balaban, 2005. personal correspondence. 2. I. Balaban, A. Pnueli, and L. Zuck. Shape analysis by predicate abstraction. In Conf. on Verification, Model Checking and Abstract Interpretation (VMCAI), 2005. 3. T. Ball, R. Majumdar, T. D. Millstein, and S. K. Rajamani. Automatic predicate abstraction of C programs. In Conf. on Programming Language Design and Implementation (PLDI), pages 203–213, 2001. 4. T. Ball, A. Podelski, and S.K. Rajamani. Relative completeness of abstraction refinement for software model checking. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS), 2002. 5. M. Benedikt, T. Reps, and M. Sagiv. A decidable logic for describing linked data structures. In European Symposium on Programming (ESOP), 1999. 6. J. Bingham and Z. Rakamarić. A logic and decision procedure for predicate abstraction of heap-manipulating programs, 2005. UBC Dept. Comp. Sci. Tech Report TR-2005-19, http://www.cs.ubc.ca/cgi-bin/tr/2005/TR-2005-19. 7. R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Trans. on Computers, C-35(8):677–691, August 1986. 8. R. E. Bryant, S. K. Lahiri, and S. A. Seshia. Modeling and verifying systems using a logic of counter arithmetic with lambda expressions and uninterpreted functions. In Conf. on Computer Aided Verification (CAV), pages 78 – 92, 2002. 9. E. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Counterexample-guided abstraction refinement. In Conf. on Computer Aided Verification (CAV), pages 154–169, 2000. 10. P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Symp. on Principles of Programming Languages (POPL), pages 238–252, 1977. 11. D. Dams and K. S. Namjoshi. Shape analysis through predicate abstraction and model checking. In Conf. on Verification, Model Checking and Abstract Interpretation (VMCAI), pages 310–323, 2003. 12. S. Das and D. L. Dill. Successive approximation of abstract transition relations,. In IEEE Symp. on Logic in Computer Science (LICS), 2001. 13. S. Das and D. L. Dill. Counter-example based predicate discovery in predicate abstraction. In Formal Methods in Computer-Aided Design (FMCAD), 2002. 14. S. Das, D. L. Dill, and S. Park. Experience with predicate abstraction. In Conf. on Computer Aided Verification (CAV), 1999. 15. C. Flanagan and S. Qadeer. Predicate abstraction for software verification. In Symp. on Principles of Programming Languages (POPL), pages 191–202, 2002. 16. S. Graf and H. Saidi. Construction of abstract state graphs with PVS. In Conf. on Computer Aided Verification (CAV), 1997. 17. D. Gries. The Science of Programming. Springer-Verlag, New York, 1981. 18. T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Lazy abstraction. In Symp. on Principles of Programming Languages (POPL), pages 58–70, 2002. A Logic and Decision Procedure for Predicate Abstraction of HMPs 221 19. N. Immerman, A. Rabinovich, T. Reps, M. Sagiv, and G. Yorsh. The boundary between decidability and undecidability for transitive closure logics. In Workshop on Computer Science Logic (CSL), pages 160–174, 2004. 20. J. L. Jensen, M. E. Jørgensen, N. Klarlund, and M. I. Schwartzbach. Automatic verification of pointer programs using monadic second-order logic. In Conf. on Programming Language Design and Implementation (PLDI), pages 226–236, 1997. 21. Y. Kesten and A. Pnueli. Verification by augmented finitary abstraction. Information and Computation, 163(1):203–243, 2000. 22. N. Klarlund and M. I. Schwartzbach. Graph types. In Symp. on Principles of Programming Languages (POPL), pages 196–205, 1993. 23. S. K. Lahiri and S. Qadeer. Verifying properties of well-founded linked lists, 2005. Microsoft Research Tech Report MSR-TR-2005-97. 24. T. Lev-Ami, T. Reps, M. Sagiv, and R. Wilhelm. Putting static analysis to work for verification: A case study. In Intl. Symp. on Software Testing and Analysis, pages 26–38, 2000. 25. T. Lev-Ami and M. Sagiv. TVLA: A system for implementing static analyses. In Static Analysis Symposium (SAS’00), pages 280–301, 2000. 26. A. Loginov, T. W. Reps, and S. Sagiv. Abstraction refinement via inductive learning. In Conf. on Computer Aided Verification (CAV), pages 519–533, 2005. 27. R. Manevich, E. Yahav, G. Ramalingam, and M. Sagiv. Predicate abstraction and canonical abstraction for singly-linked lists. In Conf. on Verification, Model Checking and Abstract Interpretation (VMCAI), pages 181–198, 2005. 28. S. McPeak and G. C. Necula. Data structure specifications via local equality axioms. In Conf. on Computer Aided Verification (CAV), pages 476–490, 2005. 29. A. Møller and M. I. Schwartzbach. The pointer assertion logic engine. In Conf. on Programming Language Design and Implementation (PLDI), pages 221–231, 2001. 30. G. Nelson. Techniques for program verification. PhD thesis, Stanford University, 1979. 31. G. Nelson. Verifying reachability invariants of linked structures. In Symp. on Principles of Programming Languages (POPL), pages 38–47, 1983. 32. M. Sagiv, T. Reps, and R. Wilhelm. Parametric shape analysis via 3-valued logic. ACM Trans. on Programming Languages and Systems, 24(3):217–298, 2002. 33. G. Yorsh, T. Reps, and M. Sagiv. Symbolically computing most-precise abstract operations for shape analysis. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 530–545, 2004. Monitoring Off-the-Shelf Components A. Prasad Sistla, Min Zhou, and Lenore D. Zuck University of Illinois at Chicago {sistla, mzhou, lenore}@cs.uic.edu Abstract. Software is being developed from off-the-shelf third party components. The interface specification of such a component may be under specified or may not fully match the user requirement. In this paper, we address the problem of customizing such components to particular users. We achieve this by constructing a monitor that monitors the component and detects any bad behaviors. Construction of such monitors essentially involves synthesizing safety properties that imply a given property that is obtained from the interface specifications of the component and the goal specification of the user. We present various methods for synthesizing such safety properties when the given property is given by an automaton or a temporal logic formula. We show that our methods are sound and complete. These results are extensions of the results given in [11]. 1 Introduction The process of constructing software is undergoing rapid changes. Instead of a monolithic software development within an organization, increasingly, software is being assembled using third-party components (e.g., JavaBeans, .NET, etc.). The developers have little knowledge of, and even less control over, the internals of the components comprising the overall system. One obstacle to composing agents is that current formal methods are mainly concerned with “closed” systems that are built from the ground up. Such systems are fully under the control of the user. Hence, problems arising from ill-specified components can be resolved by a close inspection of the systems. When composing agents use “off-the-shelf” ones, this is often no longer the case. Out of consideration for proprietary information, or in order to simplify presentation, companies may provide incomplete specifications. Worse, some agents may have no description at all except one that can be obtained by experimentation. Even if the component is completely specified, it may not fully satisfy the user requirements. Despite either of the cases, i.e., being ill-specified or mismatched-matched, “off-the-shelf” components might still be attractive enough so that the designer of a new service may wish to use them. In order to do so safely, the designer must be able to deal with the possibility that these components may exhibit undesired or unanticipated behavior, which could potentially compromise the correctness and security of the new system.  This work is supported in part by the NSF grants CCR-9988884 and CCR-0205365. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 222–236, 2006. c Springer-Verlag Berlin Heidelberg 2006  Monitoring Off-the-Shelf Components 223 The main problem addressed in this paper is that of customizing ill-specified or slightly mismatched off-the-shelf components for a particular user. We assume that we are given the interface specification ΦI of the off-the-shelf component and the goal specification Φ which denotes the user requirement. We want to design a module M which runs in parallel with the off-the-shelf component and monitors its executions. If the execution violates the user specification Φ then the monitor M indicates this so that corrective action may be taken. Our customization only consists of monitoring the executions. (Here we are assuming that violation of Φ is not catastrophic and can be remedied by some other means provided it is detected in time; for example, leakage of the credit card number in a business transaction can be handled by alerting the credit card company.) See [11] for a motivating example. Our goal is to obtain a specification φ for the module M so that M composed with the off-the-shelf component implies Φ. Further more, we want φ to be a safety property since violations of such properties can be monitored. Once such a specification φ is obtained as an automaton it is straight forward to construct the monitor M . Essentially, M would run the automaton on the executions of the off-the-shelf component and detect violations. Thus, given ΦI and Φ, our goal is to synthesize a safety property φ so that ΦI ∧ φ → Φ, or equivalently φ → (¬ΦI ∨ Φ), is a valid formula. We considered the above problem in our previous work [11]. In that work, we concentrated on obtaining φ as a deterministic Büchi automaton. There we showed that while there is always some safety property φ that guarantees φ → (¬ΦI ∨ Φ) (e.g., the trivially false property), in general, there is no “maximal” such safety property. We also synthesized a family of safety properties φk , such that the higher k is, the more “accurate” and costly to compute φk is. We also defined a class of, possibly infinite state, deterministic bounded automata, that accept the desired property φ. For these automata we proved a restricted completeness result showing that if (¬ΦI ∨ Φ) is specified by a deterministic Büchi automaton A and L(A) is the language accepted by A then for every safety property S contained in L(A) there exists a bounded automaton that accepts the S. (Actually, the paper [11] erroneously stated that this method is complete in general; however, this was corrected in a revised version [12] claiming only restricted completeness.) In this paper we extend these earlier results as follows. We consider the cases where ¬ΦI ∨ Φ is given as an automaton or as a LTL formula. For the former case, when ¬ΦI ∨ Φ is described by an automaton, we describe two ways of synthesizing the safety properties as a Büchi automaton B. The first method assumes that the given automaton A is a non-deterministic Büchi automaton and constructs a non-deterministic B from A by associating a counter with each state. The constructed automaton is much simpler than the one given in [11]. We also define a class of infinite state automata and show that all these automata accept safety properties contained in L(A). We also prove a restricted completeness result for this case. 224 A.P. Sistla, M. Zhou, and L.D. Zuck In the second method we assume that A is given as a deterministic Streett automaton. For this case, we give the construction of a class of possibly infinite state automata that accept safety properties contained in L(A). This construction employs a pair of counters for each accepting pair in the accepting condition of A. We show that this construction is sound and is also complete when A is deterministic. Since deterministic Streett automata are more powerful than deterministic Büchi automata, this method is provably more powerful than the one given in [11]. Also, we can obtain a complete method for synthesizing safety properties contained in the language of a given non-deterministic Büchi or Streett automaton by systematically converting it into an equivalent deterministic Streett automaton [14] and by employing the above method on the resulting automaton. In the case that ΦI , Φ, and hence ¬ΦI ∨ Φ, are given as LTL formulas, we give semantic and syntactic methods. The semantic method constructs the tableau, from which it constructs a non-deterministic Büchi automaton that accepts the desired safety property. The syntactic method converts the formula ¬ΦI ∨ Φ into another formula that specifies a safety property which implies ¬ΦI ∨ Φ. Outline. Section 2 contains definitions, notation, and outlines some prior results relevant to this work. Section 3 studies synthesis of safety from specifications given by non-deterministic Büchi automata and shows a partial completeness result. Section 4 studies synthesis of safety from specifications given by deterministic Streett automata and shows a completeness result. Section 5 studies synthesis of safety from specifications given by ltl formulae. Section 6 discusses related literature, and we conclude in Section 7. 2 Preliminaries Sequences. Let S be a finite set. Let σ = s0 , s1 , . . . be a possibly infinite sequence over S. The length of σ, denoted as |σ|, is defined to be the number of elements in σ if σ is finite, and ω otherwise. If α1 is a finite sequence and α2 is a either a finite or a ω-sequence then α1 α2 denotes the concatenation of the two sequences in that order. For integers i and j such that 0 ≤ i ≤ j < |σ|, σ[i, j] denotes the (finite) sequence si , . . . sj . A prefix of σ is any σ[0, j] for j < |σ|. We denote the set of σ’s prefixes by Pref (σ). Given an integer i, 0 ≤ i < |σ|, we denote by σ (i) the suffix of σ that starts with si . For an infinite sequence σ : s0 , . . ., we denote by inf(σ) the set of S-elements that occur in σ infinitely many times, i.e., inf(σ) = {s : si = s for infinitely many i’s}. Languages. A language L over a finite alphabet Σ is a set of finite or infinite sequences over σ. When L consists only of infinite strings (sequences), we sometimes refer to it as an ω-language. For a language L, we denote the set of prefixes of L by Pref (L), i.e.,  Pref (L) = Pref (σ) σ∈L Monitoring Off-the-Shelf Components 225 Following [6, 2], an ω-language L is a safety property if for every σ ∈ Σ ∞ : Pref (σ) ⊆ Pref (L) =⇒ σ∈L i.e., L is a safety property if it is limit closed – for every ω-string σ, if every prefix of σ is a prefix of some L-string, then σ must be an L-string. Büchi Automata. A Büchi automaton (NBA for short) A on infinite strings is described by a quintuple (Q, Σ, δ, q 0 , F ) where: – – – – – Q is a finite set of states; Σ is a finite alphabet of symbols; δ : Q × Σ → 2Q is a transition function; q 0 ∈ Q is an initial state; F ⊆ Q is a set of accepting states. The generalized transition function δ ∗ : Q × Σ ∗ → 2Q is defined in the usual way, i.e., for every state q, δ ∗ (q, ) = {q}, and for any σ ∈ Σ ∗ and a ∈ Σ, δ ∗ (q, σa) = ∪q ∈δ∗ (q,σ) δ(q  , a). If for every (q, a) ∈ Q × Σ, |δ(q, a)| = 1, then A is called a deterministic Büchi automaton (or DBA for short). Let σ : a1 , . . . be an infinite sequence over σ. A run r of A on σ is an infinite sequence q 0 , . . . over Q such that: – q 0 = q0 ; – for every i > 0, q i ∈ δ(q i−1 , ai ); A run r on a Büchi automaton is accepting if inf(r) ∩ F = ∅. The automaton A accepts the ω-string σ if it has an accepting run over σ (for the case of DBAs, the automaton has a single run over σ). The language accepted by A, denoted by L(A), is the set of ω-strings that A accepts. A language L is called ω-regular if it is an ω-language that is accepted by some (possibly non-deterministic) Büchi automaton. A Büchi automaton A can also be used to define a regular automaton that is just like A, only the acceptance condition of a run r is that its last state is accepting. We denote the regular language accepted by the regular version of A by Lf (A). Infinite-state. Büchi automata are defined just like Büchi automata, only that set of states may be infinite. We denote infinite-state DBAs by iDBAs, and infinite-state NBAs by iNBAs. Streett Automata. A Streett automaton S on infinite strings is described by a quintuple (Q, Σ, δ, q 0 , F ) where Q, Σ, δ, and q 0 are just like in Büchi automata, and F is of the form ∪m i=1 (Ui , Vi ) where each Ui , Vi ⊆ Q. A run r of S on σ = a1 , . . . is defined just like in the case of Büchi automaton. The run is accepting if, for every i = 1, . . . , m, if inf(r) ∩ Ui = ∅ then inf(r) ∩ Vi = ∅, i.e., if some Ui states appears infinitely often in r, then some Vi states should also appear infinitely often in r. Every Büchi automaton can be converted into a deterministic Streett automaton that recognizes the same ω-language ([14]). 226 A.P. Sistla, M. Zhou, and L.D. Zuck Linear-time Temporal Logic. We consider ltl formulae over a set of atomic propositions Π using the boolean connectives and the temporal operators (next), U (until), and W (weak until or unless). Let Σ = 2Π . We define a satisfiability relation |= between infinite sequences over Σ by: For a proposition p ∈ Π, σ |= p iff p ∈ σ 0 σ |= φ ∨ ψ iff σ |= φ or σ |= ψ σ |= ¬φ iff σ |= φ iff σ (1) |= φ1 σ |= φ1 σ |= φ1 U φ2 iff for some i ≥ 0, σ (i) |= φ2 , and for all j, 0 ≤ j < i, σ (j) |= φ1 (i) σ |= φ1 W φ2 iff for some i ≥ 0, σ |= φ1 ∧ φ2 , and for all j, 0 ≤ j < i, σ (j) |= φ1 Note that the semantics of the unless operator is slightly different than the usual one, in requiring the φ1 to hold on the state where φ2 is first encountered. For every ltl formula φ, we denote the set of atomic propositions that appear in φ by Prop(φ), and the ω-language that φ defines, i.e., the set of infinite sequences (models) that satisfy φ by L(φ). Let φ be an ltl formula. We define the closure of φ, denoted by cl (φ) to be the minimal set of formulae that is closed under negation and includes φ, every subformula of φ, and for every subformula φ1 W φ2 of φ the formula ¬φ2 U ¬φ1 . The atoms of φ, denoted by at (φ), is a subset of 2cl(φ)−∅ such that each atom A ∈ at(φ) is a maximally consistent subset of cl (φ) where for every ψ = φ1 W φ2 ∈ cl (φ), ψ ∈ A iff (¬φ2 U ¬φ1 ) ∈ A. An initial atom is any atom that contains φ. The tableau of φ, tab(φ), is a graph (at (φ), R) whose nodes are at (φ), and a (A1 , A2 ) ∈ R iff the following all hold: For every ψ ∈ cl (φ), ψ ∈ A1 iff ψ ∈ A2 For every ψ1 U ψ2 ∈ cl (φ), ψ1 U ψ2 ∈ A1 iff ψ2 ∈ A1 or ψ1 ∈ A1 and ψ1 U ψ2 ∈ A2 For every ψ1 W ψ2 ∈ cl (φ), ψ1 W ψ2 ∈ A1 iff ψ1 , ψ2 ∈ A1 or ψ1 , ψ1 W ψ2 ∈ A2 It is known (e.g., [9]) that φ is satisfiable iff tab(φ) contains a path leading from an initial atom into a maximally strongly connected component (MSCC) C such that for every ψ1 U ψ2 ∈ A ∈ C, there is an atom B ∈ C such that ψ2 ∈ B.(Such MSCCs are called “fulfilling.”) Similarly, it is also known (see, e.g., [18, 3]) how to construct a NBA (and, consequently, a deterministic Streett automaton) that recognizes L(φ). 3 Synthesizing Safety from Büchi Automata In this section we study synthesis of safety properties from a given NBA. We fix 0 , FA ). As a (possibly non-deterministic) Büchi automaton A = (QA , Σ, δA , qA shown in [11], unless L(A) is already safety, there is no maximal safety property that is contained in L(A) which is Büchi recognizable. We first show an infinite Monitoring Off-the-Shelf Components 227 chain of safety properties that are all Büchi recognizable, each contained in the next. We then present bounded Büchi automaton, an infinite-state version of NBAs, and show that each accepts a safety property in L(A). We also show a partial completeness result, namely, that if A is deterministic then each safety property contained in L(A) is accepted by some bounded automata. Both constructions, of the chain of Büchi automata and the bounded automata, are much simplified versions of their counterparts described in [11]. 3.1 Synthesis of Safety into Büchi Automata We define a chain of safety properties {Lk (A)}k>0 such that for every k, Lk (A) ⊆ L(A) and Lk (A) ⊆ Lk+1 (A). Let k > 0 be an integer. The ω-language Lk (A) is a subset of L(A), where every string has an accepting A-run with the first accepting (FA ) state appearing within the first k states of the run, and any successive accepting states in the run are separated by at most k states. Formally, w ∈ Lk (A) iff there exists an accepting A run r0 , . . . such that the set of indices I = {i ≥ 0 : ri ∈ FA } on w satisfying the following condition: for some < k, ∈ I, and for every i ∈ I, there is some j ∈ I such that i < j < i + k. 0 , k − 1 , Qk ) be a NBA where: For k ≥ 0, let Bk : (Qk , Σ , δk , qA – Qk = QA × {0, 1, · · · , k − 1} – q  , i  ∈ δk (q, i, a) iff • i > 0, q ∈ FA , and i = i − 1; • i ≥ 0, q ∈ FA , and i = k − 1. The automaton Bk simulates the possible runs of A on the input and uses a modulo k counter to maintain the number of steps within which an accepting state should be reached. Note that, when q ∈ FA , there are no outgoing transitions from the state q, 0. Since all Bk -states are accepting states, L(Bk ) is a safety property. Also, from the construction it immediately follows that L(Bk ) = Lk (A). We therefore conclude: Lemma 1. Lk (A) is a safety property, and L(Bk ) = Lk (A). 3.2 Synthesis of Safety into Bounded Automata The construction of the previous section is not complete in the sense that there are always safety properties in L(A) that are not recognized by Ak . We introduce bounded automata, a new type of Büchi automata, and show that (1) they only recognize safety properties in L(A), and (2) when A is deterministic, they can recognize every safety property contained in L(A). Assume some (possible infinite) set Y . A bounded automaton over A using Y 0 , FN ) where: is a (i)NBA described by a tuple N : (QN , Σ, δN , qN – QN ⊆ Y × QA × (N ∪ {∞}). Given a state qN = r, q, i ∈ QN , we refer to r as the Y component of qN , to q as the A − state of qN , and to i as the counter of qN ; 228 A.P. Sistla, M. Zhou, and L.D. Zuck – For every r, q, i ∈ QN , if (q  , r , i ) ∈ δN (q, r, i, a) then the following all hold: • q  ∈ δA (q, a); • If i = ∞ then i = ∞; • If q ∈ FA , then i < i or i = ∞; 0 0 – qN is in Y × {qA } × N; – FN = Y × QA × N. Note that there are no outgoing from a state whose A-state is not in FA and whose counter is 0. Also, once a run reaches a state with counter value ∞, it remains in such states. Since the counters of states with non-accepting A-states either decrease or become ∞, it follows that once a run reaches a rejecting state (i.e., a state in QN \ FN ), it remains there. It thus follows from [16] that L(N ) is a safety property. Consider an accepting N run. Since the counters of states can only decrease finitely many times from non-accepting A-states, it follows that the run has infinitely many accepting A-states. Thus, its projection on the A-states is an accepting A-run. We can therefore conclude: Lemma 2. For a bounded automaton N over A, L(N ) is a safety property in L(A). Lemma 2 shows that bounded automata over A accept only safety properties that are contained in L(A). We next identify the safety properties in L(A) that bounded automata accept. Recall that for a Büchi automaton A, Lf (A) is the regular language defined by the regular version of A. Let S ⊆ L(A) be a safety property. Similarly to [11], we define: – For a sequence σ ∈ S and α ∈ Pref (σ), let min idx(σ, α) = min{|β| : α β ∈ Lf (A) ∩ Pref (σ)}; – For any α ∈ Σ ∗ , let Z(α, S) = {min idx(σ, α) : σ ∈ S ∧ α ∈ Pref (σ)} Note that if α ∈ Lf (A) ∩ Pref (σ), then min idx(α, a) = 0 and Z(α, S) = {0}. Similarly, if α ∈ Pref (S), then Z(α, S) = ∅. It is shown in [11] that for every α ∈ Σ ∗ , Z(α, S) is finite. For any α ∈ Σ ∗ , we define  maxi∈Z(α,S) {i} Z(α, S) = ∅ idx (α, S) = ∞ otherwise Thus, idx (α, S) ∈ N iff α ∈ Pref (S). Let S ⊆ L(A) be a safety property. A bounded S-automaton over A is a 0 bounded automaton over A using Σ ∗ , of the form D : (QD , Σ, δD , qD , FD ), where: – QD = {α, q, i : α ∈ Σ ∗ , q ∈ QA , i ∈ {idx (α, S) ∞}}; – For every q ∈ FA , if α , q  , i  ∈ δD (α, q i, a) then the following all hold: 1. α = α a; 2. q  ∈ δA (q, a); Monitoring Off-the-Shelf Components 229  idx (α a, S) if i = ∞ and idx (α a, S) < i ∞ otherwise 0 0 – qD = , qA , idx(, S). – FD = Σ ∗ × QA × N  3. i = Theorem 1 (Partial Completeness). For a safety property S ⊆ L(A), L(D) ⊆ S, and if A is a deterministic automaton then L(D) = S. Proof. For (1), We show that for any σ ∈ Σ ω , if σ ∈ S, then σ ∈ L(D). Assume σ ∈ Σ ω \ S. Since S is a safety property, there exists an integer i such that every prefix α ≺ σ, of length i or more, α ∈ Pref (S). Consider such a prefix α. Obviously, idx (α, S) = ∞. Hence, in any run, after reading α, the counter of the state reached is ∞. It follows that σ ∈ L(D). For (2), assume that σ ∈ S. Define a sequence {pi }i≥0 over Σ ∗ such that p0 =  and for every i > 0, pi = σ[0, i − 1], i.e., {pi }i≥0 is the sequence of σ’s 0 prefices. Since S ⊆ L(A), there exists an accepting A-run rA , · · · of A on σ. A Consider now the sequence of D states α = {pj , rj , idx (pj , S)}j≥0 . The first 0 element in α is qD . Consider now the case that A is deterministic. When the A state of an α-state is A-accepting, its counter is 0; otherwise, the counter of the next α-state is lower. Thus, α is a D-run. Finally, since σ ∈ S, every counter in α is non-∞, thus D is accepting.   To see why the method is incomplete for general NBAs, consider the NBA A described in Figure 1. Obviously, L(A) = L1 ∪ L2 where L1 = {ai bΣ ω : i > 0} ∪ {aω } and L2 = {Σ i cω : i ≥ 0}. Note that L1 is generated by the subautomaton consisting of {q0 , q1 , q2 , q3 } and L2 is generated by the sub-automaton consisting of {q0 , q4 , q5 }. The language L1 is clearly a safety property, however, the above method cannot generate L1 , since any bounded automaton where the value of the counter in the initial state is k can only accept the L1 strings of the form {ai bΣ ω : 0 < i ≤ k} ∪ {aω }. c q a c a,b,c q 4 q 5 c a,b,c a,b,c b 2 a q 0 q 3 a a q1 Fig. 1. Automaton A 4 Synthesizing Safety from Streett Automata In this section, we generalize the construction of the bounded automata of the previous section into extended bounded automata for synthesizing safety properties contained in the language of a given Streett automaton (SA). We show that 230 A.P. Sistla, M. Zhou, and L.D. Zuck this construction is sound, and it is complete when the given automaton is a deterministic Streett Automaton (DSA). Since NBAs and SAs can be determinized into an equivalent deterministic Streett automata (DSAs) [14], the construction given in this section can be used to synthesize any safety property contained in the language accepted by a given NBA or SA, thus giving a complete system for these automata also. 0 , FA ) be a SA. Assume that FA = ∪m Let A = (QA , Σ, δA , qA i=1 {(Ui , Vi )}. Without loss of generality, assume that for every i = 1, . . . , m, Ui ∩ Vi = ∅. (If Ui ∩ Vi = ∅, then Ui can be replaced by Ui \ Vi without impacting the recognized language.) Let Y be some (possibly infinite) set. An extended bounded automaton E over 0 , FE ) where: A using Y is a (i)NBA (QE , Σ, δE , qE – QE ⊆ Y × QA × (N ∪ {∞})(2m) ; – For every R = r, q, i1 , j1 , . . . , im , jm  ∈ QE and a ∈ Σ, if R ∈ δE (R, a)  where R = r , q  , i1 , j1 , . . . , im , jm  the following all hold:  • q ∈ δA (q, a); • For every k = 1, . . . , m: 1. If ik = ∞ then ik = ∞ and if jk = ∞ then jk = ∞; 2. if q  ∈ Uk then, if ik > 0 then ik < ik , and if ik = 0 then ik = 0 and jk < jk . 3. q  ∈ (Uk ∪ Vk ) then,if ik > 0 then ik ≤ ik , and if ik = 0 then ik = 0 and jk ≤ jk . 0 0 ∈ Y × {qA } × N(2m) ; – qE – FE = Y × QA × N(2m) . Extended bounded automata are similar to bounded automata. However, states of extended bounded automata associate a pair of counters (ik , jk ) for each accepting pair (Uk , Vk ) in FA , and the transition function is different. Just like in the case of bounded automata, when an extended automaton enters a rejecting state, it cannot re-enter an accepting state. It thus follows from [16] that extended bonded automata can only recognize safety properties. Consider an accepting run ρ = R1 , . . . of E, where for every k ≥ 0, Rk = rk , qk , ik,1 , jk,1 , . . . , ik,m , jk,m . For Q ⊆ QA and k ≥ 0, we say that a Q -state appears in Rk if qk ∈ Q . Assume that for some = [1..m], U appears infinitely many times in ρ’s states, and let k ≥ 0. Consider now the sequence of ik , for k  ≥ k. Since ρ is accepting, and there are infinitely many Rk where a U -state appears, ik , never increases until some V -state appears, and decreases with each appearance of of a U -state. Once ik , becomes zero, the value of jk , decrease with each appearance of a U -state. Thus, a V state must appear in ρ after Rk . It thus follows that R, projected onto its QA -states, is an accepting run of A. We can therefore conclude: Theorem 2 (Soundness). For every extended bounded automaton E over A, the language recognized by E is a safety property that is contained in L(A). We now turn to prove completeness, i.e., we show that if A is deterministic then every safety property in L(A) is recognized by some extended bounded Monitoring Off-the-Shelf Components 231 automaton. We fix some safety property S ∈ L(A) and show it is recognized by some extended bounded automaton. For the proof we define and prove some properties of (finitely branching) infinite labeled trees. For details on the definition and proofs see the complete version of this paper in [15]. For space reasons, we only outline the main ideas here. A tree is finitely branching if each node has finitely many children. Consider a finitely branching tree T with a labeling function from T ’s nodes that labels each nodes with one of three (mutually disjoint) labels, lab1 , lab2 , and lab3 . An infinite path in T is acceptable with respect to over (lab1 , lab2 , lab3 ) if it either contains only finitely many nodes with lab1 -labels (lab1 -nodes), or it contains infinitely many nodes with lab2 labels (lab2 -nodes). The tree T is acceptable with respect to if each of its infinite paths is, and it is acceptable if it is acceptable with respect to some labeling function as above. A ranking function on T is a function associating each node with a nonnegative integer. Pair (ρ1 , ρ2 ) of ranking functions is good if for every two nodes n and n in T such that n is the parent of n , the following hold: 1. 2. 3. 4. If If If If n n n n is is is is a a a a lab1 -node lab1 -node lab3 -node lab3 -node and and and and ρ1 (n) > 0 ρ1 (n) = 0 ρ1 (n) > 0 ρ1 (n) = 0 then then then then ρ1 (n) > ρ1 (n ). ρ1 (n ) = 0 and ρ2 (n) > ρ2 (n ). ρ1 (n) ≥ ρ1 (n ). ρ1 (n ) = 0 and ρ2 (n) ≥ ρ2 (n ). Theorem 3. [15] A labeled finitely-branching tree T is acceptable iff there is a good pair of ranking functions for it. We can now prove the completeness theorem: Theorem 4. Let A be a deterministic Streett automaton and S ⊆ L(A) be a safety property. There exists an extended bounded automaton B such that L(B) = S. Proof. Consider the finitely branching tree T whose set of nodes is {(α, q) : α ∈ ∗ 0 0 Pref (S), q = δA (qA , α)}, its root is (, qA ), and for any two nodes n = (α, q) and n = (α , q  ), n is a child of n iff for some a ∈ Σ, α = α a and q  ∈ δA (q, a). Then, for an infinite path π starting from the root, its projection on its second component is an accepting run of A on the string which is the limit of its first projection (on Pref (S)). For every k = 1, . . . , m, let k be a labeling of T by the labels u, v, and n such that every Uk node in T (i.e., a node whose second component is in Uk ) is labeled with u, every Vk node is labeled with v, and every node that is in neither Uk or Vk is labeled with n. It follows that T is acceptable with respect to the labeling k over (u, v, n). It follows from Theorem 3 that for every k = 1, . . . , m, there exists a pair (ρk,1 , ρk,2 ) of ranking functions such that for every two nodes n = (α, q) and n = (α , q  ) such that n is the parent of n , the following all hold: – If n is a Uk node then • If ρk,1 (n) > 0 then ρk,1 (n ) < ρk,1 (n); • If ρk,1 (n) = 0 then ρk,1 (n ) = 0 and ρk,2 (n ) < ρk,2 (n); 232 A.P. Sistla, M. Zhou, and L.D. Zuck – If n is neither a Uk nor a Vk node, then • If ρk,1 (n) > 0 then ρk,1 (n ) ≤ ρk,1 (n); • If ρk,1 (n) = 0 then ρk,1 (n ) = 0 and ρk,2 (n ) ≤ ρk,2 (n); 0 We now define an extended bounded automaton E = (QE , Σ, δE , qE , FE ) where: – For every (α, q, i1 , j1 , ..., im , jm ) ∈ QE , ∗ 0 1. q = δA (qA , α); 2. If (α, q) ∈ T then, for each k = 1, . . . , m, ik = ρk,1 ((α, q)) and jk = ρk,2 ((α, q)); 3. If (α, q) ∈ / T then, for each k = 1, . . . , m, ik = jk = ∞. Note that this definition guarantees that for every finite string α and A-state q, there is a unique E-state whose first two coordinates are α and q. – For every state R = (α, q, . . .) ∈ QE and symbol a ∈ Σ, δE (R, a) is the unique QE state whose first two coordinates are α a and δA (q, a); 0 – qE , the initial state of E, is the unique state whose first two coordinates are 0  and qA ; – FE is the set of states such that for each k = 1, . . . , m, ik , jk = ∞. From the properties of the ranking functions it now follows that L(E) = S. 5   Synthesizing Safety from Temporal Formulae In Sections 3 and 4 we describe how to synthesize safety properties from NBAs and SAs. In this section we discuss how to directly synthesize a safety property from an ltl formula. Let φ be a ltl formula. As discussed in Section 2, one can construct tab(φ) and then an NBA, or a DSA, that recognizes L(φ), from which any of the approaches described in Section 3 can be used. However, such approaches may drastically alter tab(φ). In this section we describe two methods to obtain safety properties from tab(φ) while preserving its structure. The first is semantics based, and consists of augmentation to tab(φ). The second is syntactic based, and uses tab(φ) for monitoring purposes. 5.1 A Semantic Approach Let φ be an ltl formula and consider tab(φ) = (at (φ), R). We construct an expanded version of the tableau. We outline the construction here and defer formal description to the full version of the paper: With each formula ψ = ψ1 U ψ2 ∈ cl (φ) we associate a counter cψ that can be “active” or “inactive”. (Thus, we split atoms into new states, each containing the atoms and the counters.) If (A1 , A2 ) ∈ R, then in the new structure, if the counter associated with ψ in A1 is non-zero and active, and ψ2 ∈ A2 , then the counter is decremented; if ψ2 ∈ A2 , the counter becomes inactive. If the value of the counter is zero, the transition is disabled. If the counter is inactive and ψ ∈ A2 , then the counter becomes active and is replenished to some constant k. Monitoring Off-the-Shelf Components 233 An initial node of the expanded tableau is one which corresponds to an initial atom, where only counters of U formulae that are in the node are active (and set to k) and the others inactive. Obviously, a “good” path in the new structure is one that starts at an initial node, and for every U formula in cl (φ), either the counter of the formula is infinitely many often inactive, or it is infinitely often equal to k. In the full version of the paper we will show how the new structure defines a NBA that accepts safety properties in L(φ). 5.2 A Syntactic Approach Let φ be an ltl formula where all negations are at the level of propositions. This is achieved by the following rewriting rules: ¬(ψ1 ∨ ψ2 ) =⇒ (¬ψ1 ∧ ¬ψ2 ) ¬(ψ1 ∧ ψ2 ) =⇒ (¬ψ1 ∨ ¬ψ2 ) ¬ ψ =⇒ ¬ψ ¬(ψ1 U ψ2 ) =⇒ (¬ψ2 W ¬ψ1 ) ¬(ψ1 W ψ2 ) =⇒ (¬ψ2 U ¬ψ1 ) Let k be a positive integer. We construct an ltl formula φk by replacing each sub-formula of the form ψ1 U ψ2 appearing φ with ψ1 U≤k ψ1 where U≤k is the bounded until operator (i.e., U≤k guarantees its right-hand-side within k steps.) The following theorem, which gives a syntactic method for synthesizing safety properties, can be proven by induction on the length of φ. The monitor for φk can thenb be built by obtaining its tableau. Theorem 5. Let φ be a temporal formula, let k be positive integer, and let φk be as defined above. Then L(φk ) is a safety property which implies φ. 6 Related Work As indicated in the introduction, in [11, 12] we studied the problem of synthesizing safety from Büchi specifications and presented a solution that satisfies restricted completeness in the sense that not all safety property can be synthesized. The work here presents a solution that is both simpler and complete, namely, given a NBA A, the construction here generates any safety property that is in L(A). In addition, the work here presents synthesis of safety property directly from ltl properties. These methods are much simpler than the ones given in [11]. Similar in motivation to ours, but much different in the approach, is the work in [13]. There, the interaction between the module and the interface is viewed as a 2-player game, where the interface has a winning strategy if it can guarantee that no matter what the module does, Φ is met while maintaining ΦI . The work there only considers determinisic Büchi automata. The approach here is to synthesize the interface behavior, expressed by (possibly) non-determinisic automata, before constructing the module. Some of the techniques we employ are somewhat reminiscent of techniques used for verifying that a safety property described by a state machine satisfies a 234 A.P. Sistla, M. Zhou, and L.D. Zuck correctness specification given by an automaton or temporal logic. For example, simulation relations/state-functions together with well-founded mappings [5, 1, 17] have been proposed in the literature for this purpose. Our bounded automata use a form of well-founded mappings in the form of positive integer values that are components of each state. (This is as it should be, since we need to use some counters to ensure that an accepting state eventually appears.) However, here we are not trying to establish the correctness of a given safety property defined by a state machine, but rather, we are deriving safety properties that are contained in the language of an automaton. In [7, 8] Larsen et al propose a method for turning an implicit specification of a component into an explicit one, i.e., given a context specification (there, a process algebraic expression with a hole, where the desired components needs to be plugged in) and an overall specification, they fully automatically derive a temporal safety property characterizing the set of all implementations which, together with the given context, satisfy the overall specification. While this technique has been developed for component synthesis, it can also be used for synthesizing optimal monitors in a setting where the interface specification ΦI and the goal specification Φ are both safety properties. In this paper, we do not make any assumptions on ΦI and Φ. They can be arbitrary properties specified in temporal logic or by automata. We are aiming at exploiting liveness guarantees of external components (contexts), in order to establish liveness properties of the overall system under certain additional safety assumptions, which we can run time check (monitor). This allows us to guarantee that the overall system is as live as the context, as long as the constructed monitor does not cause an alarm. There has been much on monitoring violations of safety properties in distributed systems. In these works, the safety property is typically explicitly specified by the user. Our work is more on deriving safety properties from component specifications than developing algorithms for monitoring given safety properties. In this sense, the approach to use safety properties for monitoring that have been automatically derived by observation using techniques adapted from automata learning (see [4]) is closer in spirit to the technique here. Much attention has since been spent in optimizing the automatic learning of the monitors [10]. However, the learned monitors play a different role: whereas the learned monitors are good, but by no means complete, sensors for detecting unexpected anomalies, the monitors derived with the techniques of this paper imply the specifying property as long as the guarantees of the component provider are true. 7 Conclusions and Discussion In this paper, we considered the problem of customizing a given, off-the-shelf, reactive component to user requirements. In this process, we assume that the reactive module’s external behavior is specified by a formula ΦI and the desired goal specifications is given by a formula Φ. We presented methods for obtaining a safety specification φ so that φ → (¬ΦI ∨ Φ) by synthesizing (possibly infinitestate) NBAs for monitoring off-the-shelf components. Monitoring Off-the-Shelf Components 235 More specifically, we considered three different cases. The first one is when (¬ΦI ∨ Φ) is given as a non-deterministic Büchi automaton. For this case, the synthesized safety property is also given as a non-deterministic automaton. This method is shown to to be sound and complete when the given automaton is deterministic. This method is simpler than the one given in [11]. The second case is when (¬ΦI ∨ Φ) is given as a Streett automaton. In this case also, we gave a sound method for synthesizing safety properties contained in the language of the automaton, The method is also shown to be complete if the given automaton is a deterministic Streett Automaton. Since every Büchi automaton and Streett automaton can be converted algorithmically into an equivalent deterministic Streett automaton, this method gives us a complete system for synthesizing any safety property contained in the language of a given Büchi automaton or Streett automaton. The last case is when ¬ΦI ∨ Φ is a LTL formula. In this case, we outlined a semantic method that works directly with tableux associated with formulae, without converting the tableux into automata. We also gave a syntactic method for this. For our automata to be useful, they should be recursive, i.e., their set of states and their transition functions should be recursive functions. For monitoring purposes we need not explicitly compute the automaton and keep it in memmory, rather, we only need to maintain its current state and, whenever a new input arrives, we can use the transition function to compute the next state. For a non-dterministic automaton, we need to maintain the set of reachable states. Because of finite non-determinism, this set will be finite and is also computed on-the-fly after each input. It is to be noted that the counters that are used only count down after occurrence of some input symbols. If the off-the-shelf component never responds, the autoamton remains in its current (good) state and permits such computations. This problem can be overcome by assuming that clock ticks are inputs to the automaton as well, and allowing a counter to count down with (some or all) clockticks. There are also other possible solutions to this problem. We implemented a preliminary version of the method given in Section 3. It will be interesting to implement the general version given in Section 4 and apply it to practical problems. It will also be interesting to investigate how the counter values, given in these constructions, can be computed as functions of the history seen thus far. Real-time implementation of such systems need to be further investigated. References 1. M. Abadi and L. Lamport. The existence of state mappings. In Proceedings of the ACM Symposium on Logic in Computer Science, 1988. 2. B. Alpern and F. Schneider. Defining liveness. Information Processing Letters, 21:181–185, 1985. 3. E. A. Emerson and A. P. Sistla. Triple exponential decision procedure for the logic CTL*. In Workshop on the Logics of Program, Carnegie-Mellon University, 1983. 236 A.P. Sistla, M. Zhou, and L.D. Zuck 4. H. Hungar and B. Steffen. Behavior-based model construction. STTT, 6(1):4–14, 2004. 5. B. Jonsson. Compositional verification of distributed systems. In Proceedings of the 6th ACM Symposium on Principles of Distributed Computing, 1987. 6. L. Lamport. Logical foundation, distributed systems- methods and tools for specification. Springer-Verlag Lecture Notes in Computer Science, 190, 1985. 7. K. Larsen. Ideal specification formalisms = expressivity + compositionality + decidability + testability + ... In Invited Lecture at CONCUR 1990, LNCS 458, 1990. 8. K. Larsen. The expressive power of implicit specifications. In ICALP 1991, LNCS 510, 1991. 9. O. Lichtenstein and A. Pnueli. Checking that finite state concurrent programs satisfy their linear specification. In Proc. 12th ACM Symp. Princ. of Prog. Lang., pages 97–107, 1985. 10. T. Margaria, H. Raffelt, and B. Steffen. Knowledge-based relevance filtering for efficient system-level test-based model generation (to appear). Innovations in Systems and Software Engineering, a NASA Journal, Springer Verlag. 11. T. Margaria, A. Sistla, B. Steffen, and L. D. Zuck. Taming interface specifications. In CONCUR2005, pages 548–561, 2005. 12. T. Margaria, A. Sistla, B. Steffen, and L. D. Zuck. Taming interface specifications. In www. cs. uic. edu/ ∼sistla , 2005. 13. A. Pnueli, A. Zaks, and L. D. Zuck. Monitoring interfaces for faults. In Proceedings of the 5th Workshop on Runtime Verification (RV’05), 2005. To appear in a special issue of ENTCS. 14. S. Safra. On the complexity of ω-automata. In 29th annual Symposium on Foundations of Computer Science, October 24–26, 1988, White Plains, New York, pages 319–327. IEEE Computer Society Press, 1988. 15. A. Sistla, M. Zhou, and L. D. Zuck. Monitoring off-the-shelf components. A companion to the VMCAI06 paper, www.cs.uic.edu/∼ lenore/pubs, 2005. 16. A. P. Sistla. On characterization of safety and liveness properties in temporal logic. In Proceedings of the ACM Symposium on Principle of Distributed Computing, 1985. 17. A. P. Sistla. Proving correctness with respect to nondeterministic safety specifications. Information Processing Letters, 39:45–49, 1991. 18. M. Vardi, P. Wolper, and A. P. Sistla. Reasoning about infinite computations. In Proceedings of IEEE Symposium on Foundations of Computer Science, 1983. Parallel External Directed Model Checking with Linear I/O Shahid Jabbar and Stefan Edelkamp Computer Science Department, University of Dortmund, Dortmund, Germany {shahid.jabbar, stefan.edelkamp}@cs.uni-dortmund.de Abstract. In this paper we present Parallel External A*, a parallel variant of external memory directed model checking. As a model scales up, its successors generation becomes complex and, in turn, starts to impact the running time of the model checker. Probings of our external memory model checker IO-HSF-SPIN revealed that in some of the cases about 70% of the whole running time was consumed in the internal processing. Employing a multiprocessor machine or a cluster of workstations, we can distribute the internal working load of the algorithm on multiple processors. Moreover, assuming a sufficient number of processors and number of open file pointers per process, the I/O complexity is reduced to linear by exploiting a hash-function based state space partition scheme. 1 Introduction In explicit-state model checking software [3], state descriptors are often so large, so that main memory is often not sufficient for a lossless storage of the set of reachable states during the exploration even if all available reduction techniques, like symmetry or partial-order reduction [20, 24] have been applied. Besides advanced implicit storage structures for the set of states [19] three different options have been proposed to overcome the internal space limitations for this so-called state explosion problem, namely, directed, external and parallel search. Directed or heuristic search [23] guides the search process into the direction of the goal states, which in model checking safety properties is the set of software errors. The main observation is that using this guidance, the number of explored states needed to establish an error is smaller than with blind search. Moreover, directed model checking [29, 7] often reduces the length of the counter-example, which in turn eases the interpretation of the bug. External search algorithms [26] store and explore the state space via hard disk access. States are flushed to and retrieved from disk. As virtual memory already can exceed main memory capacity, it can result in a slow-down of speed due to excessive page-faults if the algorithm lacks locality. Hence, the major challenge in a good design for an external algorithm is to control the locality of the file access, where block-transfers are in favor to random accesses. Since hashing has a bad reputation for preserving locality, in external model checking [27, 11] duplicate elimination is delayed by applying a subsequent external sorting and scanning E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 237–251, 2006. c Springer-Verlag Berlin Heidelberg 2006  238 S. Jabbar and S. Edelkamp phase of the state set to be refined. During the algorithm only a part of the graph can be processed at a time; the remainder is stored on a disk. However, hard disk operations are about a 105 − 106 times slower than main memory accesses. More severely, this latency gap rises dramatically. According to recent estimates, technological progress yields about annual rates of 40-60 percent increase in processor speeds, while disk transfers only improve by 7 to 10%. Moreover, the costs for large amount of disk space has considerably decreased. At the time of writing, 500 gigabytes can be obtained at the cost of 300-400 US dollars. Parallel or distributed search algorithms 1 are designed to solve algorithmic problems by using many processors / computers. An efficient solution can only be obtained, if the organization between the different tasks can be optimized and distributed in a way that the working power is effectively used. Distributed model checking [28] tackles with the state explosion problem by profiting from the amount of resources provided by parallel environments. A speedup is expected if the load is distributed uniformly with a low inter-processes communication cost. In large-scale parallel breadth-first search [14], the state space is fully enumerated for increasing depth. Using this approach a complete exploration for the Fifteen-Puzzle with 16!/2 states has been executed on six disks using a maximum of 1.4 terabytes of disk storage. In model checking such an algorithm is important to verify safety properties in large state spaces. In [11], the authors presented an external memory directed model checker that utilizes hard disk to store the explored states. It utilizes heuristics estimates to guide the search towards the error state. In LTL model checking, as models scale up, the density of edges in the combined state space also increases. This in turn, effects the complexity of successors generation. Probing our external directed model checker IO-HSF-SPIN [11] revealed some bottlenecks in its running time. Surprisingly, in a disk-based model checker internal processing such as successor generations and state comparisons were sometimes dominating even the disk access times. In this paper we present a parallel variant of external memory directed model checking algorithm that improves on our earlier algorithm in two ways. Firstly, the internal workload is divided among different processors that can either be residing on the same machine or on different machines. Secondly, we suggest an improved parallel duplicate detection scheme based on multiple processors and multiple hard disks. We show that under some realistic assumptions, we achieve a number of I/Os that is linear to the explored size of the model. The paper is structured as follows. First we review large-scale parallel breadthfirst search, the combined approach of external and parallel breadth-first search. Then, we turn to directed model checking and the heuristics used in model 1 As it refers to related work, even for this text terminology is not consistent. In AI literature, the term parallel search is preferred, while in model checking research, the term distributed search is commonly chosen. In theory, parallel algorithms commonly refers to a synchronous scenario (mostly according a fixed architecture), while distributed algorithms are preferably used in an asynchronous setting. In this sense, the paper considers the less restricted distributed scenario. Parallel External Directed Model Checking with Linear I/O 239 checking. Next, we recall External A*, the external version of the A* algorithm that serves as a basis to include heuristics to the search. Afterwards, we propose Parallel External A* and provide algorithmic details for large-scale parallel A* search. Subsequently, we discuss some of the complexity aspects of the algorithm. Its application to distributed and external model checking domains is considered in the experimental part, where we have successfully extended the state-of-theart model checker SPIN to include directed, external and parallel search. The paper closes with further links to related work. 2 Large-Scale Parallel Breadth-First Search In large-scale parallel breadth-first search [14], the entire search space is partitioned into different files. The hash address is used to distribute and to locate states in those files. As the considered state spaces like the Fifteen-Puzzle are regular permutation games, each state can be perfectly hashed to a unique index. Since all state spaces are undirected, in order to avoid regenerating explored states, frontier search [15] stores, with each node, its used operators in form of a bit-vector in the size of the operator labels available. This allows to distinguish neighboring states that have already been explored from those that have not, and, in turn to omit the list of already explored states. Hash-based delayed duplicate detection uses two orthogonal hash functions. When a state is explored, its children are written to a particular file based on the first hash value. In cases like the sliding-tile puzzle, the filename correspond to parts of state vector. For space efficiency it is favorable to perfectly hash the rest of the state vector to obtain a compressed representation. The representation as a permutation index can be computed in linear time w.r.t. the length of the vector. Figure 1 depicts the layered exploration on the external partition of the state space. Even on a single processor, multi-threading is important to maximize the performance of disk-based algorithms. The reason is that a single-threaded implementation will run until it has to read from or write to disk. At that point it will block until the I/O operation has completed. Moreover hash-based delayed duplicate detection is well-suited to be distributed. Within an iteration, most file expansions and merges can be performed independently. parent files child files Fig. 1. Externally stored state space with parent and child files 240 S. Jabbar and S. Edelkamp To realize parallel processing a work queue is maintained, which contains parent files waiting to be expanded, and child files waiting to be merged. At the start of each iteration, the queue is initialized to contain all parent files. Once all the neighbors of a child file are finalized, it is inserted in the queue for merging. Each thread works as follows. It first locks the work queue. Two parent files conflict if they can generate states that hash to the same child file. The algorithm checks whether the first parent file conflicts with any other file currently being expanded. If so, it scans the queue for a parent file with no conflicts. It swaps the position of that file with the one at the head of the queue, grabs the non-conflicting file, unlocks the queue, and expands the file. For each child file it generates, it checks to see if all of its parents have been expanded. If so, it puts the child file to the queue for merging, and then returns to the queue for more work. If there is no more work in the queue, any idle threads wait for the current iteration to complete. At the end of each iteration the work queue is initialized to contain all parent files for the next iteration. 3 Directed Model Checking Directed model checking [7] incorporates heuristic search algorithms like A* [23] to enhance the bug-finding capability of model checkers, by accelerating the search for errors and finding (near to) minimal counterexamples. In that manner we can mitigate the state explosion problem and the long counterexamples provided by some algorithms like depth-first search, which is often applied in explicit state model checking. One can distinguish different classes of evaluation functions based on the information they try to exploit. Property specific heuristics [7] analyze the error description as the negation of the correctness specification. In some cases the underlying methods are only applicable to special kinds of errors. A heuristic that prioritizes transitions that block a higher number of processes focuses on deadlock detection. In other cases the approaches are applicable to a wider range of errors. For instance, there are heuristics for invariant checking that extract information from the invariant specification and heuristics that base on already given errors states. The second class has been denoted as being structural [9], in the sense that source code metrics govern the search. This class includes coverage metrics (such as branch count ) as well as concurrency measures (such as thread preference and thread interleaving). Next there is the class of user heuristics that inherit guidance from the system designer in form of source annotations, yielding preference and pruning rules for the model checker. 4 External A* External A* [6] maintains the search horizon on disk. The priority queue data structure is represented as a list of buckets. In the course of the algorithm (cf. Figure 2), each bucket (i, j) will contain all states u with path length g(u) = i and heuristic estimate h(u) = j. As similar states have same heuristic estimates, Parallel External Directed Model Checking with Linear I/O 241 h f g Fig. 2. Exploration in External A* it is easy to restrict duplicate detection to buckets of the same h-value. By an assumed undirected state space problem graph structure, we can restrict aspirants for duplicate detection furthermore. If all duplicates of a state with g-value i are removed with respect to the levels i, i − 1 and i − 2, then no duplicate state will remain for the entire search process. For breadth-first-search in explicit graphs, this is in fact the algorithm of [22]. We consider each bucket as a different file that has an individual internal buffer. A bucket is active if some of its states are currently expanded or generated. If a buffer becomes full, then it is flushed to disk. The algorithm maintains the two values gmin and fmin to address the correct buckets. The buckets of fmin are traversed for increasing gmin -value unless the gmin exceeds fmin . Due to the increase of the gmin-value in the fmin bucket, a bucket is finalized when all its successors have been generated. Given fmin and gmin , the corresponding h-value is determined by hmax = fmin − gmin. According to their different h-values, successors are arranged into different horizon lists. Duplicate elimination is delayed. Since External A* simulates A* and changes only the order of elements to be expanded that have the same f -value, completeness and optimality are inherited from the properties of A*. The I/O complexity for External A* in an implicit unweighted and undirected graph with monotone heuristic is bounded by O(sort (|E|) + scan(|V |)), where |V | and |E| are the number of nodes and edges in the explored subgraph of the state space problem graph, and scan(n) (sort(n)) are the number of I/Os needed to externally scan (sort) n elements. For challenging exploration problems, external algorithms operate in terms of days and weeks. For improving fault-tolerance, we have added a stop-and-resume option on top of the algorithm, which allows the continuation of the exploration in case of a user interrupt. An interrupt causes all open buffers to be flushed and 242 S. Jabbar and S. Edelkamp the exploration can continue with the active buffer at the additionally stored file pointer position. In this case, we only have to redo at most one state expansion. As duplicates are eliminated, generating a successor twice does not harm the correctness and optimality of the algorithm. As we flush all open buffers when a bucket is finished, in case of severe failures e.g. to the power supply of the computer we have to re-run the exploration for at most one bucket. I/O Efficient Directed Model Checking [11] applies variants of External A* to the validation of communication protocols. The tool IO-HSF-SPIN accepts large fractions of the Promela input language of the SPIN model checker. The paper extends External A* to weighted, directed graphs, with non-monotone cost functions as apparent in explicit state model checking and studies the scope for delayed duplicate within protocol verification domains. 5 Parallel External A* The distributed version of External A* Parallel External A* is based on the observation that the internal work in each individual bucket can be parallelized among different processors. Due to the dynamic allocation of new objects in software model checking, our approach is also compatible with state vectors of varying length. We first discuss our method of disk-based queues to distribute the work load among different processes. Our approach is applicable to both a client-server based environment or a single machine with multiple processors. 5.1 Disk-Based Queues To organize the communication between the processors a working queue is maintained on disk. The working queue contains the requests for exploring parts of a (g, h) bucket together with the part of the file that has to be considered2 . For improving the efficiency, we assume a distributed environment with one master and several slave processes3 . Our approach applies to both the cases when each slave has its own hard disk or if they work together on one hard disk residing on the master. Message passing between the master and slave processes is purely done on files, so that all processes can run independently. For our algorithm, master and slave work fully autonomously. We do not use spawning of child processes. Even if slave processes are killed, their work can be re-done by any other idle process that is available. One file that we call the expand-queue, contains all current requests for exploring a state set that is contained in a file. The filename consists of the current 2 3 As processors may have different computational power and processes can dynamically join and leave the exploration, the number of state space parts under consideration do not necessarily have to match the number of processors. By utilizing a queue, one also may expect a processor to access a bucket multiple times. However, for the ease of a first understanding, it is simpler to assume that the jobs are distributed uniformly among the processors. In our current implementation the master is in fact an ordinary process defined as the one that finalized the work for a bucket. Parallel External Directed Model Checking with Linear I/O 243 g and h value. In case of larger files, file-pointers for processing parts of a file are provided, to allow for better load balancing. There are different strategies to split a file into equi-distance parts or into chunks depending on the number and performance of logged-on slaves. As we want to keep the exploration process distributed, we select the file pointer windows into equidistant parts of a fixed number of C bytes for the states to be expanded. For improved I/O, number C is supposed to divide the system’s block size B. As concurrent read operations are allowed for most operating systems, multiple processes reading the same file impose no concurrency conflicts. The expand-queue is generated by the master process and is initialized with the first block to be expanded. Additionally we maintain a total count on the number of requests, i.e., the size of the queue, and the current count of satisfied requests. Any logged-on slave reads a requests and increases the count once it finishes. During the expansion process, in a subdirectory indexed by the slave’s name it generates different files that are indexed by the g and h value of the successor states. The other queue is the refine-queue also generated by the master process once all processes are done. It is organized in a similar fashion as the expand queue and allows slaves to request work. The refine-queue contains filenames that have been generated above, namely the slave-name (that does not have to match with the one of the current process), the block number, and the g and h value. For a suitable processing the master process will move the files from subdirectories indexed by the slave’s name to ones that are indexed by the block number. As this is a sequential operation executed by the master thread, we require that changing the file locations is fast in practice, an assumption that is fulfilled in all modern operating systems. In order to avoid redundant work, each processor eliminates the requests from the queue. Moreover, after finishing the job, it will write an acknowledge to an associated file, so that each process can access the current status of the exploration, and determine if a bucket has been completely explored or sorted. Since all communication between different processes is done through shared files, proper mechanism for mutual exclusion is necessary. We utilized a rather simple but efficient method to avoid concurrent writes accesses to the files. When ever a process has to write on a shared file, e.g., to the expand-queue to deque the request, it issues an operating system move (mv) command to rename the file into <process ID>.expand-queue, where process ID is a unique number that is automatically assigned to every process that enters the pool. If the command fails, it implies that the file is currently being used by another process. Since the granularity of a kernel-level command is much finer than any other program implemented on top of it, the above technique performed remarkably well. 5.2 Sorting and Merging For each bucket that is under consideration, we establish four stages in the algorithm. These phases are visualized in Figure 3 (top to bottom). The zig-zag curves visualize the sorting order of the states sequentially stored in the files. 244 S. Jabbar and S. Edelkamp The sorting criteria is defined by the state’s hash key, which dominates low-level state comparison based on the compressed state descriptor. In the exploration stage, each processor p flushes the successors with a particular g and h value to its own file (g, h, p). Each process has its own hash table and eliminates some duplicates already in main memory. The hash table is based on chaining, with chains sorted along the state comparison function. However, if the output buffer exceeds memory capacity it writes the entire hash table to disk. By the use of the sorting criteria as given above, this can be done using a mere scan of the hash table. In the first sorting stage, each processor sorts its own file. In a serial setting, such sorting has a limited effect on I/O when using external merge-sort afterwards. In a distributed setting, however, we exploit the advantage that the files can be sorted in parallel. Moreover, the number of file pointers needed is restricted by the number of flushed buffers, which is illustrated by the number of peaks in the figure. Based on this restriction, we only need to perform a merge of different sorted buffers - an operation in linear I/O. In the distribution stage, a single processor distributes all states in the presorted files into different files according to the hash value’s range. This is a parallel scan with a number of file pointers that is equivalent to the number of files that have been generated. As all input files are pre-sorted this is a mere scan. No all-including file is generated, keeping the individual file sizes small. This is of course a bottleneck to the parallel execution, as all processes have to wait until the distribution stage is completed. However, if we expect the files to be on different hard drives, traffic for file copying is needed anyway. In the second sorting stage, processors sort the files with buffers pre-sorted w.r.t the hash value’s range, to find further duplicates. The number of peaks in each individual file is limited by the number of input files (= number of processors), and the number of output files is determined by the selected partitioning of the hash index range. The output of this phase are sorted and partitioned buffers. Using the hash index as the sorting key we establish that the concatenation of files is in fact totally sorted. Fig. 3. Stages of bucket expansions in Parallel External A* Parallel External Directed Model Checking with Linear I/O 6 245 Complexity The complexity of external memory algorithm is usually measured in terms of I/Os, which assumes an unlimited amount of disk space. Using the complexity measurement [6] shows that in general it is not possible to exceed the external sorting barrier i.e., delayed duplicate detection. The lower bound for the I/O complexity for delayed duplicate bucket elimination in an implicit unweighted and undirected graph A* search with consistent estimates is at least Ω(sort (|E|)), where E is the set of explored edges in the search graph. Fewer I/Os can only be expected if structural state properties can be exploited. We will see, that by assuming a sufficient number of processors and file pointers, the I/O complexity is reduced to linear (i.e. Ω(scan(|E|)) = |E|/|B| I/Os) by exploiting a hash-function based state space partition scheme. We assume that the hash function provides a uniform partitioning of the state space. Recall that, the complexity in the external memory model assumes no restriction to the capacity of the external device. In practice, however, we have certain restrictions, e.g on the number of open file pointers per process. We may, however, assume that the number of processes is smaller than the number of file pointers. Moreover, by the given main memory capacity, we can also assume that the number of flushed buffers (the number of peaks w.r.t. the sorted order of the file) is also smaller than the number of file pointers. Using this we can achieve in fact a linear number of I/O for delayed duplicate elimination. The proof is rather simple. The number of peaks k in each individual file is bounded either by the number of flushed buffers or by the number of processes, so that a simple scan with k-file pointers suffices to finalize the sorting. An important observation is that the more processors we invest, the finer the partitioning of the state space, and the smaller the individual file sizes in a partitioned representation. Therefore, a side effect on having more processors at hand is an improvement in I/O performance based on existing hardware resource bounds. 7 Experiments We have implemented a slightly reduced version of Parallel External A* in our extension to the model checker SPIN. Our tool, entitled IO-HSF-SPIN and first introduced in [11], is in fact an extension of the model checker HSF-SPIN developed by [18]. Instead of implementing all four stages for the bucket exploration, we restricted to three stages and use one server processor to merge the sorted outcome of the others. A drawback is that all processors have to wait for a finalized merge phase. Moreover, the size of the resulting file is not partitioned and, therefore, larger. Given that the time complexity is in fact hidden during the exploration of states, so far these aspects are not a severe limitation. As all input files are pre-sorted the stated I/O complexity of the algorithm is still linear. We chose two characteristics protocols for our experiments, the CORBAGIOP protocol as introduced by [12], and the Optical Telegraph protocol that comes with SPIN distribution. The CORBA-GIOP can be scaled according to two different parameters, the number of servers and the number of clients. We 246 S. Jabbar and S. Edelkamp selected three settings: a) 3 clients and 2 servers, b) 4 clients and 1 server, and c) 4 clients and 2 servers. For the Optical Telegraph, we chose one instance with 9 stations. CORBA-GIOP protocol has a longer state vector that puts much load on the I/O. On the other hand, the Optical Telegraph has a smaller state vector but takes a longer time to be computed which puts more load on the internal processing. Moreover, the number of duplicates generated in Optical Telegraph is much more than in CORBA-GIOP. To evaluate the potential of the algorithm, we tested it on two different (rather small) infrastructures. In the first setting we used two 1 GHz Sun Solaris Workstations equipped with 512 megabyte RAM and a NFS mounted hard disk space. For the second setting we chose a Sun Enterprise System with four 750 MHz processors working with 8 gigabyte RAM and 30 gigabyte shared hard disk space. In both cases, we worked with a single hard disk, so that no form of disk parallelism was exploited. Throughout our experiments the sizes of individual processes remained less than 5% of the total space requirement. Moreover, we used the system time command to calculate the CPU running times. The compiler used is GCC v2.95.3 with default optimizations and -g and -pg options turned on for debugging and profiling information. In all of the test cases, we searched for the deadlock using number of active processes as the heuristic estimate. We depict all three parameters provided by the system: real (the total elapsed time), user (total number of CPU-seconds that the process spent in user mode) and system (total number of CPU-seconds that the process spent in kernel mode). The speedup in the columns is calculated by dividing the time taken by a serial execution by the time taken by the parallel execution. In the first set of experiments, the multi-processor machine is used. Table 1 and 2 depict the times4 for three different runs consisting of single process, 2 processes, and 3 processes. The space requirements by a run of our algorithm is approx. 2.1 GB, 5.2 GB, 21 GB, and 4.3 GB for GIOP 3-2, 4-1, 4-2, and Optical-9, respectively. For GIOP 4-1, we see a gain by a factor of 1.75 for two processors and 2.12 for three processors in the total elapsed time. For Optical Telegraph, this gain went up to 2.41, which was expected due to its complex internal processing. In actual CPU-time (user), we see an almost linear speedup that depicts the uniform distribution of internal workload and, hence, highlighting the potential of the presented approach. Tables 3 and 4 show our results in the scenario of two machines connected together via NFS. In GIOP 3-2, we observe a small speed-up of a factor of 1.08. In GIOP 4-1, this gain increased to about a factor of 1.3. When tracing this limited gain, we found that the CPUs were not used at full speed. The bottleneck turned out to be the underlying NFS layer that was limiting the disk accesses to only about 5 Megabytes/sec. This bottleneck can be removed by utilizing local hard disk space for successors generation and then sending the files to the file server using secure copy (scp) that allows a transfer rate of 50 Megabytes/sec. 4 The smallest given CPU time always corresponds to the process that established the error in the protocol first. Parallel External Directed Model Checking with Linear I/O 247 Table 1. CPU time for Parallel External A* in GIOP on a multiprocessor machine GIOP 3-2 real user system GIOP 4-1 real user system GIOP 4-2 real user system 1 process 2 processes Speedup 3 processes Speedup 25m 59s 17m 30s 17m 29s 1.48 15m 55s 16m 6s 15m 58s 1.64 18m 20s 9m 49s 9m 44s 1.89 7m 32s 7m 28s 7m 22s 2.44 4m 22s 4m 19s 4m 24s 0.98 4m 45s 4m 37s 4m 55s 0.92 1 process 2 processes Speedup 3 processes Speedup 73m 10s 41m 42s 41m 38s 1.75 37m 24s 34m 27s 37m 20s 2.12 52m 50s 25m 56s 25m 49s 2.04 18m 8s 18m 11s 18m 20s 2.91 10m 20s 9m 6s 9m 15s 1.12 9m 22s 9m 8s 9m 0s 1.13 1 process 2 processes Speedup 3 processes Speedup 269m 9s 165m 25s 165m 25s 1.62 151m 6s 151m 3s 151m 5s 1.78 186m 12s 91m 10s 90m 32s 2.04 63m 12s 63m 35s 63m 59s 2.93 37m 21s 29m 44s 30m 30s 1.25 30m 19s 30m 14s 29m 50s 1.24 Table 2. CPU time for Parallel External A* in Optical Telegraph on a multiprocessor machine Optical-9 1 process 2 processes Speedup 3 processes Speedup real 55m 53s 31m 43s 31m 36s 1.76 23m 32s 23m 17s 23m 10s 2.41 user 43m 26s 22m 46s 22m 58s 1.89 15m 20s 14m 24s 14m 25s 3.01 system 5m 47s 4m 43s 4m 18s 1.34 3m 46s 4m 45s 4m 40s 1.22 Table 3. CPU time for Parallel External A* in GIOP on two computers and NFS GIOP 3-2 1 process 2 processes Speedup real 35m 39s 32m 52s 33m 0s 1.08 user 11m 38s 6m 35s 6m 34s 1.76 system 3m 56s 4m 16s 4m 23s 0.91 GIOP 4-1 1 process 2 processes Speedup real 100m 27s 76m 38s 76m 39s 1.3 user 31m 6s 15m 52s 15m 31s 1.96 system 8m 59s 8m 30s 8m 36s 1.05 Table 4. CPU time for Parallel External A* in Optical Telegraph on two computers and NFS Optical-9 1 process 2 processes Speedup real 76m 33s 54m 20s 54m 6s 1.41 user 26m 37s 14m 11s 14m 12s 1.87 system 4m 33s 3m 56s 3m 38s 1.26 In the Optical Telegraph, we see a bigger reduction of about a factor of 1.41 because of complex but small state vector and more dependency on internal computation. As in the former setting, the total CPU-seconds consumed by a process (user ) in 1-process mode is reduced to almost half in 2-processes mode. 248 8 S. Jabbar and S. Edelkamp Related Work There is much work on external search in explicit graphs that are fully specified with its adjacency list on disk. In model checking software the graph are implicit. There is no major difference in the exposition of the algorithm of Munagala and Ranade [22] for explicit and implicit graphs. However, the precomputation and access efforts are by far larger for the explicit graph representation. The breadthfirst search algorithm has been improved by [21]. Even for implicit search, the body of literature is rising at a large pace. Edelkamp and Schrödl [8] consider external route-planning graphs that are naturally embedded into the plane. This yields a spatial partitioning that is exploited to trade state exploration count for improved local access. Zhou and Hansen [30] impose a projection function to have buckets to control the expansion process in best-first search. The projection preserves the duplicate scope or locality of the state space graph, so that states that are outside the locality scope do not need to be stored. Korf [13] highlights different options to combine A*, frontier and external search. His proposal is limited as only any two options were compatible. Edelkamp [5] extends the External A* with BDDs to perform a external symbolic BFS in abstract space, followed by an external symbolic A* search in original space that take the former result as a lower bound to guide the search. Zhou and Hansen [31] propose structure preserving state space projections to have a reduced state space to be controlled on disk. They also propose external construction of pattern databases. The drawback of their approach is that it applies only to the state spaces that have a very regular structure - something that is not available in model checking. In Stern and Dill’s initial paper on external model checking in the Murφ Verifier variants of external breadth-first search are considered. In Bao and Jones [1], we see another faster variant of Murφ Verifier with magnetic disk. They propose two techniques: one is based on partitioned hash tables, and the other on chained hash table. They targeted to reduce the delayed duplicate detection time by partitioning the state space that, in turn, diminishes the size of the set to be checked. They claim their technique to be inherently serial having less room for a distributed variant. In the approach of Kristensen and Mailund [16] repeated scans over the search space in a geometric scan-line approach with states that are arranged in the plane wrt. some progress measure based on a given partial order. The scan over the entire state space is memory efficient, as it only needs to store the states that participate in the transitions that cross the current scan position. These states are marked visited and the scan over the entire state space is repeated to cope with states that are not reachable with respect to earlier scans. Their dependeny on a good progress measure hinders its applicability to model checking in general. They have applied it mainly to Petri nets based model checking where the notion of time is used as a progress measure. While some approaches to parallel and distributed model checking are limited to the verification of safety properties [2, 10, 17], other work propose methods for checking liveness properties expressed in linear temporal logic (LTL) [4, 18]. Recall that LTL model checking mainly entails finding accepting cycles in a Parallel External Directed Model Checking with Linear I/O 249 state space, which is performed with the nested depth-first search algorithm. The correctness of this algorithm depends on the depth-first traversal of the state space. Since depth-first search is inherently sequential [25], additional data structures and synchronization mechanisms have to be added to the algorithm. These requirements can waste the resources offered by the distributed environment. Moreover, formally proving the correctness of the resulting algorithms is not easy. It is possible, however, to avoid these problems by using partition functions that localize cycles within equivalence classes. The above described methods for defining partitions can used for this purpose, leading to a distributed algorithm that performs the second search in parallel. The main limitation factor is that scalability and load distribution depend on the structure of the model and the specification. Brim et. al. [4] discusses one such approach where the SPIN model checker has been extended to perform nested depth-first search in a distributed manner. They proposed to maintain a dependency structure for all the accepting states visited. The nested parts for these accepting states are then started as separate procedures based on the order dictated by the dependeny structure. LluchLaufente [18] improves on the idea of nested depth-first search. The proposed idea is to divide the state space in strongly connected components by exploiting the structure of the never claim automaton of the specification property. The nested search is then restricted only to the corresponding component. If during exploration, a node that belongs to some other component is encountered, it is inserted in the visited list of its corresponding component. Unfortunately, there is little or almost no room externalize the above two approaches. Depth-first search lacks locality and hence not suited to be externalize. Stern and Dill [28] propose a parallel version of the Murφ model checker. They also use a scheme based on run-time partitioning of the state space and assigning different partitions to different processors. The partitioning is done by using a universal hash function that uniformly distributes the newly generated states. 9 Conclusion Enhancing directed model checking is essential to improve error detection in software. The paper contributes the first study of combining external, directed and parallel search to mitigate the state-explosion problem in model checking. We have shown a successful approach to extend the external A* exploration in a distributed environment, as apparent in multi-processor machines and workstation clusters. Exploration and delayed duplicate detection are parallelized without concurrent write access, which is often not available. Error trails provided by depth-first search exploration engines are often exceedingly lengthy. Employed with a lower-bound heuristic, the proposed algorithm yields counter-examples of optimal length, and is, therefore, an important step to ease error comprehension for the programmer / software designer. Under reasonable assumptions on the number of file pointers per process, the number of I/Os is linear in the size of the model, by means the external work of exploring the model matches the complexity of scanning it. 250 S. Jabbar and S. Edelkamp The approach is implemented on top of the model checker IO-HSF-SPIN and the savings for the single disk model are very encouraging. We see an almost linear speedup in the CPU-time and significant gain in the total elapsed time. Compared to the potential of external search, the models that we have looked at are considerably small. In near future, we expect to implement the multiple-disk version of the algorithm as mentioned in this text. To conduct empirical observations for external exploration algorithm is a time-consuming task. In very large state spaces algorithms can run for weeks. For example the complete exploration of the Fifteen Puzzle consumed more than three weeks. Given more time, we expect larger models to be analyzed. We also expect further fine-tuning to increase the speed-up that we have obtained. Moreover, the approach presented is particular to model checking only, and can be applied to other areas where searching in a large state space is required. Acknowledgments. The authors wish to thank Mathias Weiss for helpful discussions and technical support that made running of the presented experiments possible. References 1. T. Bao and M. Jones. Time-efficient model checking with magnetic disks. In Tools and Algorithms for the Construction and Analysis of Systems(TACAS), pages 526– 540, 2005. 2. S. Ben-David, T. Heyman, O. Grumberg, and A. Schuster. Scalable distributed on-the-fly symbolic model checking. In Formal methods in Computer-Aided Design (FMCAD), pages 390–404, 2000. 3. B. Bérard, A. F. M. Bidoit, F. Laroussine, A. Petit, L. Petrucci, P. Schoenebelen, and P. McKenzie. Systems and Software Verification. Springer, 2001. 4. L. Brim, J. Barnat, and J.Stribnra. Distributed LTL model-checking in SPIN. In Workshop on Software Model Checking (SPIN), pages 200–216, 2001. 5. S. Edelkamp. External symbolic pattern databases. In International Conference on Automated Planning and Scheduling (ICAPS), pages 51–60, 2005. 6. S. Edelkamp, S. Jabbar, and S. Schrödl. External A*. In German Conference on Artificial Intelligence(KI), volume 3238, pages 226–240, 2004. 7. S. Edelkamp, S. Leue, and A. Lluch-Lafuente. Directed explicit-state model checking in the validation of communication protocols. International Journal on Software Tools for Technology, 5(2–3):247–267, 2004. 8. S. Edelkamp and S. Schrödl. Localizing A*. In National Conference on Artificial Intelligence (AAAI), pages 885–890, 2000. 9. A. Groce and W. Visser. Heuristic model checking for java programs. International Journal on Software Tools for Technology Transfer, 6(4), 2004. 10. T. Heyman, D. Geist, O. Grumberg, and A. Schuster. Achieving scalability in parallel reachability analysis of very large circuits. In International Conference on Computer-Aided Verification (CAV), pages 20–35, 2000. 11. S. Jabbar and S. Edelkamp. I/O efficient directed model checking. In Verification, Model Checking and Abstract Interpretation (VMCAI), volume 3385, pages 313– 329, 2005. Parallel External Directed Model Checking with Linear I/O 251 12. M. Kamel and S. Leue. Formalization and validation of the General Inter-ORB Protocol (GIOP) using PROMELA and SPIN. International Journal on Software Tools for Technology Transfer, 2(4):394–409, 2000. 13. R. Korf. Best-first frontier search with delayed duplicate detection. In National Conference on Artificial Intelligence (AAAI), pages 650–657, 2004. 14. R. E. Korf and P. Schultze. Large-scale parallel breadth-first search. In National Conference on Artificial Intelligence (AAAI), pages 1380–1385, 2005. 15. R. E. Korf and W. Zhang. Divide-and-conquer frontier search applied to optimal sequence allignment. In National Conference on Artificial Intelligence (AAAI), pages 910–916, 2000. 16. L. M. Kristensen and T. Mailund. Path finding with the sweep-line method using external storage. In International Conference on Formal Engineering Methods (ICFEM), pages 319–337, 2003. 17. F. Lerda and R. Sisto. Distributed-memory model checking with SPIN. In Workshop on Software Model Checking (SPIN), 1999. 18. A. Lluch-Lafuente. Directed Search for the Verification of Communication Protocols. PhD thesis, University of Freiburg, 2003. 19. K. L. McMillan. Symbolic Model Checking. Kluwer Academic Press, 1993. 20. K. L. McMillan. Symmetry and model checking. In M. K. Inan and R. P. Kurshan, editors, Verification of Digital and Hybrid Systems, pages 117–137. Springer-Verlag, 1998. 21. K. Mehlhorn and U. Meyer. External-memory breadth-first search with sublinear I/O. In European Symposium on Algorithms (ESA), pages 723–735, 2002. 22. K. Munagala and A. Ranade. I/O-complexity of graph algorithms. In Symposium on Discrete Algorithms (SODA), pages 87–88, 2001. 23. J. Pearl. Heuristics. Addison-Wesley, 1985. 24. D. A. Peled. Ten years of partial order reduction. In Computer-Aided Verification (CAV), volume 1427, pages 17–28, 1998. 25. J. H. Reif. Depth-first search is inherently sequential. Information Processing Letters, 20:229–234, 1985. 26. P. Sanders, U. Meyer, and J. F. Sibeyn. Algorithms for Memory Hierarchies. Springer, 2002. 27. U. Stern and D. Dill. Using magnetic disk instead of main memory in the murphi verifier. In International Conference on Computer Aided Verification (CAV), pages 172–183, 1998. 28. U. Stern and D. L. Dill. Parallelizing the Murphi verifier. In International Conference on Computer-Aided Verification (CAV), pages 256–278, 1997. 29. C. H. Yang and D. L. Dill. Validation with guided search of the state space. In Conference on Design Automation (DAC), pages 599–604, 1998. 30. R. Zhou and E. Hansen. Structured duplicate detection in external-memory graph search. In National Conference on Artificial Intelligence (AAAI), 2004. 683–689. 31. R. Zhou and E. Hansen. External-memory pattern databases using delayed duplicate detection. In National Conference on Artificial Intelligence (AAAI), pages 1398–1405, 2005. Piecewise FIFO Channels Are Analyzable Naghmeh Ghafari and Richard Trefler School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada Abstract. FIFO systems consisting of several components that communicate via unbounded perfect FIFO channels arise naturally in modeling distributed systems. Despite well-known difficulties in analyzing such systems, they are of significant interest as they can describe a wide range of Internet-based communication protocols. Previous work has shown that the piecewise languages play important roles in the study of FIFO systems. In this paper, we show that FIFO systems composed of piecewise components can in fact be analyzed algorithmically. We demonstrate that any FIFO system composed of piecewise components can be described by a finite state, abridged structure, representing an expressive abstraction of the system. We present a procedure for building the abridged model and prove that this procedure terminates. We show that we can analyze the infinite computations of the more concrete model by analyzing the computations of the finite, abridged model. This enables us to check properties of the FIFO systems including safety properties of the components as well as a general class of end-to-end system properties. Finally, we apply our analysis method to an IP-telecommunication architecture to demonstrate the utility of our approach. 1 Introduction Finite state machines that communicate over unbounded channels are used as a model of computation in the analysis of distributed protocols (cf. for example [10, 6, 1, 19, 12]). While unboundedness of communication channels simplifies the modeling of the protocols, it complicates their analysis. Since one unbounded channel is sufficient to simulate the tape of a Turing machine, most interesting verification problems for this class of protocols are undecidable. However, a substantial effort has gone into identifying subclasses for which the verification problem is decidable because this analysis is crucial in the design of safety-critical distributed systems (cf. [1, 2, 4, 5, 6, 8, 9, 12, 16, 17, 19]). In this paper, we show that by restricting attention to systems composed of a class of ‘well-designed’ components, automated system analysis is possible even when the components communicate over unbounded perfect FIFO channels. This work was inspired by studying real world examples of distributed protocols, such as IP-telecommunication protocols [7]. In those protocols, it is particularly desirable that communication between peers (or components) be well-behaved.  Authors are supported in part by grants from OGS, NSERC of Canada, and Nortel Networks. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 252–266, 2006. c Springer-Verlag Berlin Heidelberg 2006  Piecewise FIFO Channels Are Analyzable 253 And in fact, many descriptions of components with well-behaved communication can be expressed by a subclass of regular languages known as the piecewise languages (cf. [17, 7, 15, 25]). Piecewise regular languages play important roles in the study of communication protocols (cf. [9, 17]). Intuitively, a language is piecewise if it is the union of sets of strings, where each set is given by a regular expression of the form ∗ an−1 Mn∗ , in which each Mi is a subset of an alphabet Σ and M0∗ a0 M1∗ . . . Mn−1 each ai is an element of Σ. Surprisingly, these relatively straightforward expressions can be used to capture important system properties. In this paper, we show that in fact FIFO systems composed of communicating piecewise components are amenable to algorithmic analysis. It was shown in [17] that the limit language of the channel contents of such systems is regular. A method was also presented for calculating the limit language of systems with only a single channel and a certain class of multiple channel systems. Yet, calculating the limit language of general multiple channel systems remains an open problem. However, the strings that represent the contents of the channels may be too long to be calculated exactly. Thus, a method of abridging channel contents (without losing key information) is required. First, we present a procedure to calculate, for each channel, a superset of the channel language associated with specific component states. We then use these supersets in calculating representations of the system computations. We note that, in general, limit languages do not represent system computations but rather the reachable state sets. Further, we show that our procedure is applicable to all FIFO systems composed of piecewise components. It is worth mentioning that while a string in the superset may not ever occur in a reachable system state, for the analysis technique presented in the current work, reachability of component states is exact. We present a procedure that translates an n process distributed state machine (DSM (n)) composed of piecewise components communicating over FIFO channels into an abridged distributed state machine (ADSM (n)). The ADSM (n) is closely related to DSM (n); however, it differs in that the contents of the unbounded channels are represented by piecewise regular expressions. We establish the finiteness of the abridged model by proving that the calculation procedure terminates. Furthermore, we show that a global, component state, composed of the local states of the processes, is reachable in ADSM (n), if and only if, the same component state is reachable in the corresponding DSM (n). The representation of channel contents by piecewise regular expressions in the context of global system transitions has allowed us to group together sets of actions that may be executed by one process from a given global state. The main reason for abridging DSM (n) is to be able to reason about its infinite behavior by analyzing the behavior of the finite ADSM (n). We consider system properties expressed by a restricted, but expressive, class of Büchi automata. Here, the states of the Büchi automata represent the finite set of component states and we require that the language of the automata be stuttering closed. We can then show that there is a computation of ADSM (n) that satisfies 254 N. Ghafari and R. Trefler the automaton, if and only if, there is a computation of DSM (n) that satisfies the automaton. This procedure allows one to check properties of the DSM (n) including both local and global reachability properties as well as a general class of end-to-end system properties, thus showing that piecewise FIFO channels are analyzable. 1.1 Motivation IP-telecommunication protocols are often prone to subtle errors because of indeterminacy in the scheduling of concurrent processes and communication latency. These errors can easily go undetected in testing and simulation due to their infrequency yet they can cause major destruction when they occur. Thus, it is desirable to formally verify that the protocols meet their specifications in all circumstances (cf. [1, 2, 5, 7, 12, 14, 19]). IP-telecommunication protocols can in many cases be effectively modeled as finite state machines communicating via unbounded channels, with enough generality to examine the concurrency issues involved. ECLIPSE, now called BoxOS, is the next generation telecommunication services over IP infrastructure developed at AT&T Labs (see [7, 15, 25] and [17] for related work). Essentially, a telephone call is represented by a list (or more generally, a graph) of boxes, while communication between neighboring boxes is handled by perfect FIFO channels. At a sufficient level of abstraction, boxes may all be viewed as finite state transducers. Importantly, the language of the automata that model the communication behavior of these boxes is naturally a piecewise regular language. Communication in these protocols begins with an initiator trying to reach a given destination. A call is built recursively. The current endpoint begins the call initiation protocol with a chosen neighbor, the callee. If this initiation results in a stable connection, the callee becomes the new endpoint and the call construction continues. Call termination is required to proceed in a reverse order and in general required to begin at a call endpoint. Here, our focus is on an important class of properties that involve several boxes; for instance, that particular messages sent from a local box eventually reach a distant box. While communication with multiple neighbors significantly complicates the description of the box, the languages of box inputs, box outputs, and sequences of box states can each be given by piecewise expressions. 1.2 Previous Work FIFO systems have played key roles in the description of distributed systems. A complete description of prior work is given in [17]. Brand and Zafiropulo [10] first showed that many important questions regarding FIFO channels are undecidable. Despite this, Pachl [19] described several scenarios that are tractable. In [10], Brand and Zafiropulo also defined a number of minimal properties that well-formed protocols are expected to satisfy and showed to what extent these properties can be ensured. The work of [3] addresses the problem of automatically validating protocols based on algorithms that can detect violations of such minimal properties. Piecewise FIFO Channels Are Analyzable 255 FIFO nets, a generalization of FIFO systems, are described in a survey [14] along with several decidability results that depend on the use of bounded languages. The work of [13] considers the use of semilinear regular expressions to represent bounded languages. We note that piecewise languages are not required to be bounded. Several important decidability results have been given in the context of lossy channel systems [1, 2, 12]. For lossy systems, messages may be lost arbitrarily. Simple regular expressions are considered in [1] in the study of lossy channels. In contrast, our model is not lossy; some messages may be duplicated but they may not be lost. The recent work on regular model checking [8, 16] (see also [9]) has been focused on the analysis of parameterized systems, whereas in this paper, we have provided automated analysis techniques applicable to specific FIFO systems. Surprisingly, Boigelot and Godefroid [5] were able to show BDD-based symbolic methods could be used to calculate the limit language of channel contents for many important programs. Further, [4] extends their work to consider sets of operations on channel contents, while [6] considers sequences of channel operations that preserve the regularity of channel languages. We have instead focused on the reachability of component states for a specific class of protocols that allow conditional write operations; it is these conditional operations that increase the expressiveness of our model. Piecewise regular languages are strictly more expressive than the piecewise testable languages of [23]. Further, piecewise languages have been described as Alphabetic Pattern Constraints [9]. These languages have also been considered in [20] and [21]. However, these works did not consider use of piecewise languages in the analysis of FIFO systems. 2 Piecewise Languages  Let Σ be a finite alphabet, denote the set of natural numbers, and λ the empty string. Given two strings r1 and r2 , concatenation of the elements of r1 and r2 is denoted by r1 r2 . If r∗ represents the Kleene closure of r, then r+ = rr∗ . Let a ∈ Σ, l ∈ , for i ∈ [0..l], ai ∈ Σ. In the sequel, a1 + a2 denotes the nondeterministic choice between a1 and a2 . By +ai | P(ai ), we denote the regular expression (a1 + . . . + am ) consisting of the ai ’s that satisfy P.  Definition 1. [17] Thin piecewise regular (tpr) expressions are defined by the following grammar r ::= λ | a | (a1 + . . . + al )+ | r1 r2 . For example, (a + b)+ ac is a tpr expression but (ab)+ c is not. We consider tpr’s of the form λ, a, and (a1 + . . . + al )+ to be atomic. Definition 2. [17] Piecewise regular (pr) expressions generalize tpr expressions allowing the inclusion of the + operator. For example, given thin piecewise regular expressions r1 and r2 , r1 + r2 is a pr. It should be noted that (ab)+ c is not in fact piecewise. 256 N. Ghafari and R. Trefler Proposition 1. (cf. [17, 9]) Piecewise languages are star-free; they are closed under finite unions, finite intersections, concatenation, shuffle, projections (defined by letter-to-letter mappings), and inverse homomorphisms, but not under complementation and substitutions. Definition 3. (cf. [17]) A piecewise automaton A = ((Σ, Q, δ, q 0 , F ), ≤) is defined as follows: Q is a finite set of states; q 0 ∈ Q is the initial state; ≤ is a partial order on Q; δ : Q×Σ → 2Q is a non-deterministic transition relation and if q  ∈ δ(q, a) then q ≤ q  ; F ⊆ Q is the set of accepting states. We sometimes omit F , in which case it is understood that F = Q. Given q ∈ Q and w ∈ Σ ∗ , δ(q, w) is defined as usual: δ(q, λ) = {q} and δ(q, wa) = {p | for some state r ∈ δ(q, w), p ∈ δ(r, a)}. For w ∈ Σ ∗ we say that the piecewise automaton A accepts w iff δ(q 0 , w) contains a state in F . The language of A is defined as L(A) = {w ∈ Σ ∗ | δ(q 0 , w) contains a state in F }. Proposition 2. [17] For every piecewise automaton A, there is a piecewise regular expression r such that L(A) = r, and for every piecewise regular expression r, there is a piecewise automaton A such that L(A) = r. 3 FIFO Channel Systems In this section, we define DSM (n) and its abridged model, ADSM (n). Then we present a procedure to construct ADSM (n) from a given DSM (n) description and we prove that this procedure terminates. We also show the relationship between the computations of ADSM (n) and DSM (n). In the sequel, the Ai ’s are piecewise automata, Ai = ((Σi , Qi , δi , qi0 ), ≤i ). These automata may read from a single incoming channel, write on a single outgoing channel, or conditionally, read from a single incoming channel and write on a single outgoing channel. A distributed state machine with n piecewise automata is an asynchronous system with n2 − n channels and is defined as follows. Definition 4. For a set of piecewise automata {Ai }i∈[0..n−1] , the DSM (n) = (Q, C, Σ, R, q 0 , δ) is given by: – Q = ×i Qi is the component state set. – C = {c0,1 , . . . , cn−1,n−2 } is a set of channels, ci,j is the channel from process i to process j. – Σ = ∪c∈C Σc , Σc is the alphabet of channel c. – R = ×c∈C Σc∗ is the set of possible channel descriptions. – q 0 is the initial state. – δ is the transition relation:     δ ⊆Q× {c?a, c!b, c?a → c !d | a, b ∈ Σc and d ∈ Σc } × Q c,c ∈C and is given below. Piecewise FIFO Channels Are Analyzable 257 Intuitively, the transition relation δ is built up from the transition relations of the piecewise Ai ’s such that every transition in δ consists of exactly one transition in δi . Then δ is a set of triples (q, op, q  ), where q and q  are in Q and op is a read, write, or a conditional operation. Thus, a transition of the form (q, c?a, q  ) represents a change of the component state q to q  while removing an a from the head of channel c. The channel content must be of the form aw for this operation to be enabled. A transition of the form (q, c!b, q  ) represents a change of the component state q to q  while transforming the content of the channel c from w to wb. A transition of the form (q, c?a → c !d, q  ) represents a change of the component state q to q  while removing a from the head of channel c and appending d to the tail of channel c . Since every transition in δ consists of exactly one transition in δi , we may abuse the notation by using (qi , op, qi ) ∈ δ instead of (q, op, q  ) ∈ δ where qi and qi are in Qi . A global state of DSM (n) is composed of two parts: component state, which presents the local states of the processes, and the contents of the channels. Definition 5. A global state of DSM (n) is a tuple ψ = (q0 , . . . , qn−1 , r0,1 , . . . , ∗ rn−1,n−2 ) with qi ∈ Qi and ri,j ∈ Σi,j . The initial global state is q 0 = (q00 , . . . , 0 qn−1 , λ, . . . , λ). We use the following notation to refer to the elements of a global state: ψ(i) = qi and ψ(i, j) = ri,j . We assume that the alphabets of the different channels are pairwise disjoint. Thus, δi (qi , a) is a read transition if a ∈ Σj,i or a write transition if a ∈ Σi,j . The global transition relation of DSM (n) = (Q, C, Σ, R, q 0 , δ) is a set G of op triples (ψ, op, ψ  ), where ψ and ψ  are global states. We will write ψ −→ ψ  to denote that ψ  is a successor of ψ and (ψ, op, ψ  ) ∈ G is given below: ck,i ?a – if (qi , ck,i ?a, qi ) ∈ δ, then ψ −→ ψ  provided that ψ(i) = qi , ψ  (i) = qi , ψ(k, i) = aψ  (k, i), for all j =  i, ψ  (j) = ψ(j), and for all (l, m) = (k, i),  ψ (l, m) = ψ(l, m). ci,k !b – if (qi , ci,k !b, qi ) ∈ δ, then ψ −→ ψ  provided that ψ(i) = qi , ψ  (i) = qi , ψ  (i, k) = ψ(i, k)b, for all j = i, ψ  (j) = ψ(j), and for all (m, l) = (i, k), ψ  (m, l) = ψ(m, l). ck,i ?a→ci,j !d – if (qi , ck,i ?a → ci,j !d, qi ) ∈ δ, then ψ −→ ψ  provided that ψ(i) = qi ,     ψ (i) = qi , ψ(k, i) = aψ (k, i), ψ (i, j) = ψ(i, j)d, for all u = i, ψ  (u) = ψ(u), and for all (l, m) = (k, i) and (l, m) = (i, j), ψ  (l, m) = ψ(l, m). We write ψ → ψ  when we do not distinguish the specific operation that causes the change of global states from ψ to ψ  . Definition 6. A computation path in DSM (n) is a finite or infinite sequence denoted ψ = ψ0 → ψ1 → . . . where ψ0 = q 0 and for all i, ψi → ψi+1 ∈ G. All the computation paths in DSM (n) are acceptable. Then L∗ (DSM (n)) is a subset of (Q × R)∗ and consists of all the finite computations in DSM (n) and 258 N. Ghafari and R. Trefler Lω (DSM (n)) is a subset of (Q × R)ω and consists of all the infinite computations in DSM (n). The language of DSM (n), denoted L(DSM (n)), is equal to L∗ (DSM (n)) ∪ Lω (DSM (n)). 3.1 Construction of ADSM (n) We give a definition for ADSM (n) that contains a procedural definition of its transition relation. This procedural definition describes how to build ADSM (n) in an automated way from the syntactic description of DSM (n). ADSM (n) is closely related to DSM (n). Its component state set is the same as the component state set of DSM (n). However, the channel contents in DSM (n) are replaced by tpr expressions in ADSM (n). Let tpr(Σc ) denote the set of tpr’s over the alphabet Σc . Given a DSM (n), the ADSM (n) is defined as follows. Definition 7. For a given DSM (n) = (Q, C, Σ, R, q 0 , δ), the ADSM (n) = (Q, T, q 0 , η, Φ) where – – – – – Q = ×i Qi is the component state set. T = ×c∈C tpr(Σc ) is the set of possible channel descriptions. q 0 is the initial state. η ⊆ (Q × T ) × (Q × T ) is the transition relation and is given below. Φ denotes a fairness constraint on the transitions. The fairness constraint requires that a transition that only reads from a channel should not be allowed to read an empty channel: for all processes, if process i infinitely often reads a from channel cj,i , then process j must infinitely often writes a on channel cj,i . Definition 8. A global state of ADSM (n) is a tuple ξ = (q0 , . . . , qn−1 , t0,1 , . . . , tn−1,n−2 ) with qi ∈ Qi and ti,j ∈ tpr(Σi,j ). The initial global state is q 0 = 0 , λ, . . . , λ). (q00 , . . . , qn−1 In order to motivate the definition of transition relation in ADSM (n), we first explain how the contents of the channels are updated by a single transition. If there is a write self loop operation (c!a) on a state of a process, we cannot say in advance how many times this transition is executed. However, here the assumption is that the c!a is executed at least once. Thus, we represent the set of all write transitions (c!a) in DSM (n) by a single transition in ADSM (n) that writes a+ on the channel. Now, assume another process has a self loop read operation c?a. Again, we do not know in advance how many times this transition may be executed. Thus, the corresponding ADSM (n) also accepts an infeasible computation in which there are more read operations than the number of a’s in the channel. However, this does not cause any problems since ADSM (n) also accepts a similar computation in which the number of read and write operations are equal. Example: ADSM (n) Transition Relation. Consider the ith process in DSM (n), Ai . Assume Ai , on state s, has only one self loop transition with a conditional operation cj,i ?a → ci,k !b. Let ξ = (. . . , si , . . . , tj,i , ti,k , . . .) be a global state of the corresponding ADSM (n). Let ξ  = (. . . , si , . . . , tj,i , ti,k , . . .) be a possible next global state of ξ that corresponds to the self loop conditional Piecewise FIFO Channels Are Analyzable 259 transition cj,i ?a → ci,k !b on state s of Ai . It is clear that si = si . The following shows how the contents of channel cj,i and ci,k are updated in ξ  . channel ci,k : – If ti,k = wb+ , then there will be no change in the content of ci,k ; thus ti,k = ti,k . – If ti,k = wb+ , then b+ is appended to the tail of ci,k ; thus ti,k = ti,k b+ . channel cj,i : – If tj,i = aw, then a gets read and the content of cj,i transforms to w. – If tj,i = (a + a1 + . . . + am )+ w, then there are four possible values for tj,i : • The head of tj,i may represent a string that starts with a followed by another a. Thus, by reading one a the content of the channel cj,i still can be represented by (a + a1 + . . . + am )+ w. As a result tj,i = tj,i . • The head of tj,i may represent a string with length one, a. Thus, by reading one a the content of the channel cj,i transforms to w. • The head of tj,i may represent a string that starts with a followed by a string that does not contain any a’s. Thus, the content of cj,i transforms to (a1 + . . . + am )+ w. • The head of tj,i may represent a string that starts with a, followed by a string that consists of letters from the set {a1 , . . . , am }, followed by a string that contains a’s. Thus, the content of channel cj,i transforms to (a1 + . . . + am )+ (a + a1 + . . . + am )+ w. End of Example. The representation of channel contents by thin piecewise regular expressions in the context of global system transitions allows us to group together sets of transitions that may be executed by one process from a given global state. For the sake of clarity, we present the read and write transitions in the format of conditional transitions. Thus, a write transition is presented as a conditional transition that reads a dummy message from any of the incoming channels. A read transition is presented as a conditional transition that writes a dummy message on an outgoing channel. Let αj denote the head of channel cj,i . Let βj,k ⊆ αj be a set of letters at the head of channel cj,i that enables a set of self loop conditional transitions that writes on ci,k at state q of process Ai : βj,k = {bji ∈ Σj,i | there is an e ∈ Σi,k  and q ∈ δi (q, (bji , e))}. Let βj,k ⊆ αj be a set of letters at the head of channel cj,i that enables a set of conditional transitions to other states that writes on ci,k at  state q of Ai : βj,k = {bji ∈ Σj,i | there is an e ∈ Σi,k and q  ∈ δi (q, (bji , e)) and    = ∪j βj,k and βj,i = ∪k βj,k . q = q}. Let βi,k = ∪j βj,k , βi,k In the sequel, let {eik } be a set of letters that may be written on ci,k due to a set of enabled self loop conditional transitions at state q of Ai and k = +eik | for some b ∈ βi,k and q ∈ δi (q, (b, eik )). Let {eik } be a set of letters that may be written on ci,k due to a set of enabled conditional transitions to other states at  , q  ∈ δi (q, (b , eik )) and q  = q. state q of Ai and k = +eik | for some b ∈ βi,k The transition relation η is defined as follows: ξ → ξ  ∈ η iff for some i ∈ [0..n − 1], for all l = i, ξ(l) = ξ  (l), and for all k = l, k = i, ξ(l, k) = ξ  (l, k), and 260 N. Ghafari and R. Trefler – ξ  (i) = ξ(i), and a single incoming channel is updated by removal of a single letter and for a single outgoing channel such as ci,k , eik is added to the tail of ξ(i, k). The following shows how the contents of an incoming channel, such as cj,i , are updated:  • if a ∈ βj,k then a ξ  (j, i) = ξ(j, i), or  • if a ∈ βj,k , r ∈ tpr(Σj,i ), and ξ(j, i) = (a + a1 + . . . + am )+ r, then ξ  (j, i) = ξ(j, i), or  , r ∈ tpr(Σj,i ), and ξ(j, i) = (a + a1 + . . . + am )+ r, then • if a ∈ βj,k  ξ (j, i) = r, or  , r ∈ tpr(Σj,i ), and ξ(j, i) = (a + a1 + . . . + am )+ r, then • if a ∈ βj,k ξ  (j, i) = (a1 + . . . + am )+ r, or  , r ∈ tpr(Σj,i ), and ξ(j, i) = (a + a1 + . . . + am )+ r, then • if a ∈ βj,k  ξ (j, i) = (a1 + . . . + am )+ (a + a1 + . . . + am )+ r. or – ξ  (i) = ξ(i), and a set of incoming channels is updated by removal of a set of letters, and a set of outgoing channels, such as ci,k , is updated by writing the corresponding letters; for t ∈ tpr(Σi,k ), if ξ(i, k) = t( k )+ , then ξ  (i, k) = ξ(i, k) ( k )+ , otherwise ξ  (i, k) = ξ(i, k). The following shows how the contents of incoming channels, such as cj,i , are updated: • if b ∈ βj,i , r ∈ tpr(Σj,i ), and ξ(j, i) = b r, then ξ  (j, i) = r, or • if r ∈ tpr(Σj,i ), ξ(j, i) = (b1 + . . . + bu )+ r, and βj,i ∩ {b1 , . . . , bu } = ∅, then ξ  (j, i) = ξ(j, i), or • if r ∈ tpr(Σj,i ), ξ(j, i) = (b1 + . . . + bu )+ r, and βj,i ∩ {b1 , . . . , bu } = ∅, then ξ  (j, i) = r, or • if r ∈ tpr(Σj,i ), ξ(j, i) = (b1 + . . . + bu )+ r, and {b1 , . . . , bu } \ βj,i = {d1 , . . . , dv }, then ξ(j, i) = (d1 + . . . + dv )+ r, or • if r ∈ tpr(Σj,i ), ξ(j, i) = (b1 + . . . + bu )+ r, and {b1 , . . . , bu } \ βj,i = {d1 , . . . , dv }, then ξ(j, i) = (d1 + . . . + dv )+ (b1 + . . . + bu )+ r. Definition 9. A computation path in ADSM (n) is a finite or infinite sequence denoted ξ = ξ0 → ξ1 → . . . where ξ0 = q 0 and for all i, ξi → ξi+1 ∈ η. The L∗ (ADSM (n)) is a subset of (Q × T )∗ and consists of all the finite computations in ADSM (n) and Lω (ADSM (n)) is a subset of (Q × T )ω and consists of all the infinite and fair computations in ADSM (n). The language of ADSM (n), denoted by L(ADSM (n)), is equal to L∗ (ADSM (n)) ∪ Lω (ADSM (n)). Example: Two Automata with Two Channels. This example illustrates how the tpr’s are updated in the calculation of the abridged model of a DSM (2) when there are no changes in the component states. Let A1 = ((Σ, P, δ1 , p0 ), ≤1 ) and A2 = ((Σ, Q, δ2 , q 0 ), ≤2 ) be two piecewise automata. Let Σ = Σ1,2 ∪ Σ2,1 . The composite system of the two automata A1 ∗ and A2 with two channels is defined as DSM (2) = (P × Q, {c1,2 , c2,1 }, Σ, Σ1,2 × ∗ 0 0 Σ2,1 , (p , q , λ, λ), δ). For i ∈ [1..u], assume on the states from which process A1 and A2 do not have any transitions to other states there are a set of self loop conditional transitions in A1 : c2,1 ?di → c1,2 !ai and a set of self loop read and write transitions in A2 : c1,2 ?bi and c2,1 !ei , respectively. Piecewise FIFO Channels Are Analyzable (p, q, - , -) A2 A2 A2 A1 (p, q, -, e+) A1 A2 A1 A2 (p, q, a+, -) A1 261 A2 A2 (p, q, a+, e+) Fig. 1. Partial representation of global states in ADSM (2) if α = β and γ = ζ Consider the computation path (p0 , q 0 , r0 , s0 ) → (p1 , q 1 , r1 , s1 ) → . . . in ADSM (2) where ri ∈ tpr(Σ1,2 ) and si ∈ tpr(Σ2,1 ). Because of the ordering relations on P and Q, for some i and for all j, i < j, pj = pi and q j = q i . Let α = {a ∈ Σ1,2 | for some d ∈ Σ2,1 , δ1 (pi , (d, a)) = pi } = {a1 , . . . , ak }, and β = {b ∈ Σ1,2 | δ2 (q i , b) = q i } = {b1 , . . . , bl }, γ = {d ∈ Σ2,1 | for some a ∈ Σ1,2 , δ1 (pi , (d, a)) = pi } = {d1 , . . . , dm }, and ζ = {e ∈ Σ2,1 | δ2 (q i , e) = q i } = {e1 , . . . , en }. First assume ri = λ and si = λ. Let a = +α, b = +β, d = +γ, and e = +ζ. Figure 1 shows the possible transitions in the mentioned path starting from (pi , q i , λ, λ) supposing α = β and γ = ζ. Since the component state (pi , q i ) stays the same, the superscript i is not shown in the figure. Symbol λ is also shown by ‘−’ symbol. As the figure shows, according to the transition relation of ADSM (n), a write operation performed by process A2 causes a transition from global state (p, q, λ, λ) to (p, q, λ, e+ ). Process A2 can continue writing on channel c2,1 or process A1 can read from c2,1 and write on c1,2 . In the latter case, channel c2,1 may become empty after some reads, depicted by a transition to state (p, q, a+ , λ), or it may still contain some letters, depicted by a transition to state (p, q, a+ , e+ ). Figure 2 illustrates the possible transitions supposing β ⊂ α, α − β = , and γ = ζ. Let f = + . The state machine corresponding to the case where γ ⊂ ζ can be constructed similarly. If si = λ then there are only two cases to consider. A2 can only add at most one atomic expression to si , namely (e1 + . . . + en )+ . Furthermore, A1 can only decrease the length of si , whether to λ or not. In both cases there are only a finite number of ancestors. End of Example. For the global state (q0 , . . . , qn−1 , t0,1 , . . . , tn−1,n−2 ) we use notation (q, t) in order to represent its component state and channel contents. The following lemma shows that the next state relation of ADSM (n) is finite. Lemma 1. If DSM (n) = (Q, C, Σ, R, q 0 , δ) and ADSM (n) = (Q, T, q 0 , η, Φ) is the abridged model of DSM (n), given (q, t) ∈ Q × T , the set of (q  , t ) ∈ Q × T such that (q, t) → (q  , t ) ∈ η is finite. Proof: This follows from the finiteness of the Ai ’s, Σ, and the definition of the transition relation η.  262 N. Ghafari and R. Trefler A2 (p, q, - , -) A2 (p, q, a+ , -) A2 (p, q, f+, -) A2 A1 A1 A2 A2 (p, q, a+f+, -) A1 A1 A1 A2 (p, q, -, e+) A1 A2 (p, q, a+, e+) A1 A2 A1 (p, q, f+, e+) A2 A1 A2 A2 (p, q, a+f+, e+) A1 A2 Fig. 2. Partial representation of global states in ADSM (2) if β ⊂ α and γ = ζ Given DSM (n), its abridged model, ADSM (n), is constructed recursively: for each global state such as ξ we calculate the set {ξ  | ξ → ξ  ∈ η}. Let G be the current set of global states {ξ} of ADSM (n) plus the transition connections between them. G represents that portion of ADSM (n) that has been calculated so far. Create Z which will consist of the component states of G plus sets of letters replacing the tpr’s of G. Set Z is used as a test of termination for the procedure of calculating G, i.e. it helps to determine whether in the further calculation of G a global state with a new component state is going to appear or not. Create the initial set of letters from a tpr as follows: start at the head of the tpr representing the content of channel ci,j . Include in this set, all those letters in the head of the tpr for which Aj has a transition. If the tpr has no such letters then stop — Aj has no more inputs from channel ci,j . As long as Aj has a transition for some letters in the head of the tpr, remove the head of the tpr and continue calculating the set of letters from the now shorter tpr. This process clearly terminates, either because the tpr is now empty or there is a limit on the different possible reads that can be performed from that channel by Aj . Apply the above process for all channels and component states. Then update Z according to the transition relation of DSM (n). If a conditional transition is triggered, e.g. Ai reads an a and then writes b, record this by adding b to the appropriate outgoing channel. If any of these transitions result in a new component state, then stop. The calculation of ADSM (n) is not finished yet — destroy Z and continue calculating G, i.e. continue calculating ADSM (n). Otherwise, continue to calculate Z. This process terminates since no letters are ever removed from the set associated to the tpr representing the contents of the channels for any given component state of Z. Therefore, these sets will stop growing after a finite number of updates. If in the process of updating Z none of the transitions result in a new component state and the sets of letters stop growing, it is implicit that no more global states with new component states will appear in the further calculation of G. Theorem 1. If DSM (n) = (Q, C, Σ, R, q 0 , δ), the procedure for generating the abridged model ADSM (n) = (Q, T, q 0 , η, Φ) terminates. Proof: In order to prove the termination of the procedure of generating ADSM (n), we have to prove that the process of calculating G terminates. Piecewise FIFO Channels Are Analyzable 263 A new Z is only created if any of the transitions result in a new component state. This only happens a finite number of times since there are only so many component state configurations, given that the states of any single piecewise automaton come with a partial order that is respected by the transition relation of DSM (n) and therefore of ADSM (n). Thus, termination of the calculation of Z implies that the calculation of G has reached a point from which there are no more global states with new component states left to be explored.  In the procedure of calculating ADSM (n), after calculating all the global states with distinct component states, any new global state is only created through updates to tpr’s that represent the contents of the channels. Consider channel ck,i . Assume state s is a state from which process Ak does not have any transitions to other states. Let i = +eki | for some b at the head of Ak ’s incoming channels, s ∈ δk (s, (b, eki )). Process Ak can only update the contents of ck,i by appending i to the tail of its associated tpr. Since there is a limit on the different possible (conditional) writes on ck,i that can be performed by Ak , there will be a finite number of i ’s. Process Ai can only decrease the length of the tpr that represents the contents of the channel ck,i . This only happens a finite number of times. Thus, there will be a finite number of updates to the tpr’s in the further calculation of the global states of ADSM (n). This was also illustrated in the previous example. Let ψ = ψ0 → ψ1 → . . . be a computation in DSM (n) and ξ = ξ0 → ξ1 → . . . be a computation in ADSM (n). ξ and ψ are two corresponding computations if for all i, ψ0 (i) = ξ0 (i) and for every global state with distinct component state in ψ, such as ψk , there exists a corresponding global state in ξ, ξg , such that for all i, ψk (i) = ξg (i). In other words, the component states of the corresponding global states should be identical. In addition, the order of the appearance of the corresponding global states in two corresponding computations should be the same. Lemma 2. For every computation in DSM (n), there exists a corresponding computation in its abridged model, ADSM (n), and for every computation in ADSM (n), there exists a set of corresponding computations in DSM (n). 4 Automated Analysis The main reason for abridging DSM (n) is to be able to reason about its infinite computations by analyzing the computations of the finite ADSM (n). According to the construction procedure of ADSM (n), all the appropriate channel contents in DSM (n) are represented by a set of tpr ’s in ADSM (n). If ψk and ξg are two corresponding global states of DSM (n) and ADSM (n) respectively, and the channel contents are given by r = (ψk (0, 1), ψk (0, 2), . . . , ψk (n − 1, n − 2)) and t = (ξg (0, 1), ξg (0, 2), . . . , ξg (n − 1, n − 2)), then r ∈ t denotes that each ψk (i, j) is contained in ξg (i, j). Lemma 3. For every reachable state in DSM (n), (q, r), there exists a reachable state in ADSM (n), (q, t), where r ∈ t and for every reachable state in ADSM (n), (q, t), there exists a reachable state in DSM (n), (q, r), where r ∈ t. 264 N. Ghafari and R. Trefler As explained, the component states in the global states of ADSM (n) are composed of the reachable component states in DSM (n). Since ADSM (n) is a finite state system, the reachability analysis can be performed by an exhaustive search of its state space. The computations of ADSM (n) satisfy property S if and only if there is no computation x of ADSM (n) such that ADSM (n), x |= ¬S. We use a standard automata theoretic technique to decide this problem [24]. The technique consists of creating an automaton, B¬S , on infinite strings, cf. [11] and [18], which accepts only those strings that satisfy the property ¬S. We combine the structure ADSM (n) with B¬S to form the product automaton ADSM (n) × B¬S . This is an automaton on infinite strings whose language is empty if and only if the computations of ADSM (n) satisfy the property S. A Büchi automaton is an automaton that recognizes infinite strings. A Büchi automaton over the alphabet Σ is of the form B = (Q, q 0 , ∆, F ) with finite state set Q, initial state q 0 ∈ Q, transition relation ∆ ⊆ Q×Σ ×Q, and a set F ⊆ Q of accepting states. A run of B on a ω-word α = α(0)α(1) . . . from Σ ω is a sequence σ(0)σ(1) . . . such that σ(0) = q 0 and (σ(i), α(i), σ(i + 1)) ∈ ∆ for i ≥ 0. The run is called accepting if it satisfies the fairness constraint F , i.e. some state of F occurs infinitely often on the run. B accepts α if there is an accepting run of B on α. The ω-language recognized by B is denoted as L(B) = {α ∈ Σ ω | B accepts α}. We consider system properties expressed by a restricted class of Büchi automata. Here, a Büchi automaton for DSM (n) has a fixed set of possible states, at most one for each component state in DSM (n). Since the set of computations in ADSM (n) is a superset of the set of computations in DSM (n), we require that the language of each Büchi property automaton be stuttering closed. Lemma 4. If B is a stuttering closed Büchi property automaton for DSM (n), for every computation in Lω (ADSM (n)) and the language of the property automaton, L(B), there exists a corresponding computation in Lω (DSM (n)) and L(B) and for every computation in Lω (DSM (n)) and L(B) there exists a corresponding computation in Lω (ADSM (n)) and L(B). 5 Analysis of an IP-Telecommunication Architecture BoxOS is AT&T’s virtual telecommunication network based on IP [7, 15, 25]. In this architecture, a telephone call is presented by a set of boxes representing telephones and call features that communicate over possibly unbounded, perfect communication channels. At a sufficient level of abstraction, each box represents a finite state transducer. Figure 3 describes part of a transparent box that represents a communication template that all telephony features should implement. The transparent box communicates with two neighbors across four separate channels. Messages to/from the upstream (initiating), caller, are sent/received via i channels. Messages to/from the downstream (receiving), callee, are sent/received via o channels. Importantly, the language of the automaton that models the behavior of Piecewise FIFO Channels Are Analyzable 265 INIT i?Setup LINKING1 o!Setup i?Status i?Status o?unavail, o?unknown LINKING2 o?avail o?gone LINKING3 i!avail ERROR o?Status i!Status LINKED i?Status o!Status END Fig. 3. Template feature box the transparent box (as shown in Figure 3) is a piecewise regular language. It should be noted that a telephone call may be represented by a DSM (n) and Figure 3 depicts one of the piecewise automata in the telephone call. In our framework, different sets of properties can be specified that establish the correct behavior of the transparent box. A safety property can be verified by solving a reachability problem based on the negation of the safety property, for example, checking the reachability of a dedicated error state. A class of end-to-end temporal system properties specify that, for instance, if a message is sent from one end, it will eventually be received at the other end; Always(Send ⇒ Eventually Receive) whose negation is: Eventually(Send and Always not Received)[22]. For example, if a setup message is sent from the initiating caller, it will eventually be received by the callee. Another class of round-trip properties ensure that for every request there will eventually be a reply. For example, if a caller places a call (sends a setup message) and does not tear it down, eventually it receives one of the outcome signals ‘unknown’, ‘unavail’, or ‘avail’ from downstream. It is worth noting that representing the contents of the channels by tpr’s allows specification of a wider range of channel properties, such as the existence of a specific message in a channel. A more thorough analysis of these properties is left for future work. 6 Summary and Future Work We have presented an automated procedure for analyzing properties of FIFO systems of piecewise processes that occur naturally in the description of IPtelecommunication architectures. Further, it is evident that communication protocols must be ‘well-designed’ or satisfy some similar notion. Such distributed systems are prone to errors and our analysis techniques can be used to check for the presence of errors. For the future, we are interested in incorporating our analysis technique into standardized analysis tools and developing extensions of our techniques that are applicable to non-piecewise models. 266 N. Ghafari and R. Trefler References 1. P. A. Abdulla, A. Annichini, and A. Bouajjani. Symbolic verification of lossy channel systems: Application to the bounded retransmission protocol. In Proc. of TACAS’99, LNCS 1579, pages 208–222, 1999. 2. P. A. Abdulla and B. Jonsson. Verifying programs with unreliable channels. In Proc. of LICS’93, pages 160–170, 1993. 3. P. Argon, G. Delzanno, S. Mukhopadhyay, and A. Podelski. Model checking for communication protocols. In Proc. of SOFSEM’01, LNCS 2234, 2001. 4. B. Boigelot. Symbolic method for exploring infinite states spaces. PhD thesis, Université de Liège, 1998. 5. B. Boigelot and P. Godefroid. Symbolic verification of communication protocols with infinite state spaces using QDDs. FMSD, 14(3):237–255, 1999. 6. B. Boigelot, P. Godefroid, B. Willems, and P. Wolper. The power of QDDs. In Proc. of the 4th Int. Symp. on Static Analysis, LNCS 1302, pages 172 – 186, 1997. 7. G. W. Bond, F. Ivančić, N. Klarlund, and R. Trefler. Eclipse feature logic analysis. Second IP Telephony Workshop, 2001. 8. A. Bouajjani, B. Jonsson, M. Nillson, and T. Touili. Regular model checking. In Proc. of CAV’00, LNCS, 2000. 9. A. Bouajjani, A. Muscholl, and T. Touili. Permutation rewriting and algorithmic verification. In Proc. of LICS’01, 2001. 10. D. Brand and P. Zafiropulo. On communicating finite-state machines. J. ACM, 30(2):323–342, 1983. 11. J. R. Buchi. On a decision method in restricted second order arithmetic. In Proc. of Int. Cong. on Logic, Methodology, and Philosophy of Science, 1960. 12. G. Cece, A. Finkel, and S. P. Iyer. Unreliable channels are easier to verify than perfect channels. Information and Computation, 124(1):20–31, 1996. 13. A. Finkel, S. P. Iyer, and G. Sutre. Well-abstracted transition systems: Application to FIFO automata. Information and Computation, 181(1):1–31, 2003. 14. A. Finkel and L. Rosier. A survey on the decidability questions for classes of FIFO nets. In Advances in Petri Nets, LNCS 340, pages 106–132. Springer, 1988. 15. M. Jackson and P. Zave. Distributed Feature Composition: A virtual architecture for telecommunications services. IEEE Trans. on Soft. Eng., 24(10):831–847, 1998. 16. Y. Kesten, O. Maler, M. Marcus, A. Pnueli, and E. Shahar. Symbolic model checking with rich assertional languages. Theo. Comp. Sci., 256(1–2):93–112, 2001. 17. N. Klarlund and R. Trefler. Regularity results for fifo channels. In Proc. of AVoCS’04, 2004. To appear in Electronic Notes in Theoretical Computer Science. 18. M. Nivat and D. Perrin. Automata on Infinite Words. Springer Verlag, 1985. 19. J. K. Pachl. Protocol description and analysis based on a state transition model with channel expressions. Proc. of PSTV, pages 207–219, 1987. 20. J.-E. Pin. Syntactic semigroups. In Rozenberg and Salomaa, editors, Handbook of language theory, Vol. I. Springer Verlag, 1997. 21. J.-E. Pin and P. Weil. Polynomial closure and unambiguous product. Theory Comput. Systems, 30:1–39, 1997. 22. A. Pnueli. The temporal logic of programs. Proc. 18th FOCS, pages 46–57, 1977. 23. I. Simon. Piecewise testable events. Proc. of 2nd GI Conf., 33:214–222, 1975. 24. M. Vardi and P. Wolper. An automata-theoretic approach to automatic program verification. In Proc. of LICS’86, pages 332–344, 1986. 25. P. Zave and M. Jackson. The DFC Manual. AT&T, 2001. Updated as needed. Available from http:// www.research.att.com/ projects/ dfc. Ranking Abstraction of Recursive Programs Ittai Balaban1 , Ariel Cohen1 , and Amir Pnueli1,2 2 1 Dept. of Computer Science, New York University Dept. of Computer Science, Weizmann Institute of Science, Rehovot 76100, Israel Abstract. We present a method for model-checking of safety and liveness properties over procedural programs, by combining state and ranking abstractions with procedure summarization. Our abstraction is an augmented finitary abstraction [KP00, BPZ05], meaning that a concrete procedural program is first augmented with a well founded ranking function, and then abstracted by a finitary state abstraction. This results in a procedural abstract program with strong fairness requirements which is then reduced to a finite-state fair discrete system (FDS) using procedure summarization. This FDS is then model checked for the property. 1 Introduction Procedural programs with unbounded recursion present a challenge to symbolic modelcheckers since they ostensibly require the checker to model an unbounded call stack. In this paper we propose the integration of ranking abstraction [KP00, BPZ05], finitary state abstraction, procedure summarization [SP81], and model-checking into a combined method for the automatic verification of LTL properties of infinite-state recursive procedural programs. The inputs to this method are a sequential procedural program together with state and ranking abstractions. The output is either “success”, or a counterexample in the form of an abstract error trace. The method is sound, as well as complete, in the sense that for any valid property, a sufficiently accurate joint (state and ranking) abstraction exists that establishes its validity. The method centers around a fixpoint computation of procedure summaries of a finite-state program, followed by a subsequent construction of a behaviorally equivalent nondeterministic procedure-free program. Since we begin with an infinite-state program that cannot be summarized automatically, a number of steps involved in abstraction and LTL model-checking need to be performed over the procedural (unsummarized) program. These include augmentation with non-constraining observers and fairness constraints required for LTL verification and ranking abstraction, as well as computation of state abstraction. Augmentation with global observers and fairness is modeled in such a way as to make the associated properties observable once procedures are summarized. In computing the abstraction, the abstraction of a procedure call is handled by abstracting “everything but” the call itself, i.e., local assignments and binding of actual parameters to formals and of return values to variables.  This research was supported in part by NSF grant CCR-0205571, ONR grant N00014-99-10131, and SRC grant 2004-TJ-1256. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 267–281, 2006. c Springer-Verlag Berlin Heidelberg 2006  268 I. Balaban, A. Cohen, and A. Pnueli The method relies on machinery for computing abstraction of first order formulas, but is orthogonal as to how abstraction is actually computed. We have implemented a prototype based on the TLV symbolic model-checker [Sha00] by extending it with a model of procedural programs. Specifically, given a symbolic finite-state model of a program, summaries are computed using BDD techniques in order to derive a fair discrete system (FDS) free of procedures to which model-checking is applied. The tool is provided, as input, with a concrete program and predicates and ranking components. It computes predicate abstraction [GS97] automatically using the method proposed in [BPZ05]. We have used this implementation to verify a number of canonical examples, such as Ackerman’s function, the Factorial function and a procedural formulation of the 91 function. While most components of the proposed method have been studied before, our approach is novel in that it reduces the verification problem to that of symbolic modelchecking. Furthermore, it allows for application of ranking and state abstractions while still relegating all summarization computation to the model-checker. Another advantage is that fairness is supported directly by the model and related algorithms, rather than it being specified in a property. 1.1 Related Work Recent work by Podelski et al. [PSW05] generalizes the concept of summaries to capture effects of computations between arbitrary program points. This is used to formulate a proof rule for total correctness of recursive programs with nested loops, in which a program summary is the auxiliary proof construct (analogous to an inductive invariant in an invariance proof rule). The rule and accompanying formulation of summaries represent a framework in which abstract interpretation techniques and methods for ranking function synthesis can be applied. In this manner both [PSW05] and our work aim at similar objectives. The main difference from our work is that, while we strive to work with abstraction of predicates, and use relations (and their abstraction) only for the treatment of procedures, the general approach of [PSW05] is based on the abstraction of relations even for the procedure-less case. A further difference is that, unlike our work, [PSW05] does not provide an explicit algorithm for the verification of aribtrary LTL properties. Instead it relies on a general reduction from proofs of termination to LTL verification. Recursive State Machines ( RSMs) [AEY01, ABE+ 05] and Extended RSMs [ACEM05] enhance the power of finite state machines by allowing for the recursive invocation of state machines. They are used to model the control flow of programs containing recursive procedure calls, and to analyze reachability and cycle detection. They are, however, limited to programs with finite data. On the other hand, the method that we present in this paper can be used to verify recursive programs with infinite data domains by making use of ranking and finitary state abstractions. In [BR00], an approach similar to ours for computing summary relations for procedures is implemented in the symbolic model checker Bebop. However, while Bebop is able to determine whether a specific program statement is reachable, it cannot prove termination of a recursive boolean program or of any other liveness property. The paper is organized as follows: In Section 2 we present the formal model of (procedure-free) fair discrete systems, and model-checking of LTL properties over them. Ranking Abstraction of Recursive Programs 269 Section 3 formalizes recursive procedural programs presented as flow-graphs. In Section 4 we present a method for verifying the termination of procedural programs using ranking abstraction, state abstraction, summarization, construction of a procedure-free FDS , and finally, model-checking. In Section 5 we present a method for LTL modelchecking of recursive procedural programs. Finally, Section 6 concludes and discusses future work. 2 Background 2.1 Fair Discrete Systems The computation model, fair discrete systems (FDS) D : V, Θ, ρ, J , C, consists of the following components: • V : A finite set of variables. We define a state s to be an interpretation of the variables in V . Denoted by Σ is the set of all states of V . • Θ : The initial condition. It is an assertion characterizing all the initial states of the FDS . A state is called initial if it satisfies Θ. • ρ : A transition relation. This is an assertion ρ(V, V  ), relating a state s ∈ Σ to its D-successor s ∈ Σ. • J : A set of justice (weak fairness) requirements (assertions). • C : A set of compassion (strong fairness) requirements (assertions). Each compassion requirement is a pair p, q of state assertions. A run of an FDS is a sequence of states σ : s0 , s1 , ..., satisfying the following: • Initiality: s0 is initial, i.e., s0 |= Θ. • Consecution: For every j ≥ 0, the state sj+1 is a D-successor of the state sj . A computation of an FDS is an infinite run which also satisfies: • Justice: For every J ∈ J , σ contains infinitely many states satisfying J. • Compassion: For every p, q ∈ C, σ should include either only finitely many pstates, or infinitely many q-states. An FDS D is said to be feasible if it has at least one computation. A synchronous parallel composition of systems D1 and D2 , denoted by D1 ||| D2 , is specified by the FDS D : V, Θ, ρ, J , C, where V = V1 ∪ V2 , J = J1 ∪ J2 , ρ = ρ1 ∧ ρ2 , C = C1 ∪ C2 Θ = Θ1 ∧ Θ2 , Synchronous parallel composition is used for the construction of an observer system O, which evaluates the behavior of another system D. That is, running D ||| O will allow D to behave as usual while O evaluates it. 2.2 Linear Temporal Logic – LTL is an extension of propositional logic with two additional basic temporal operators, (Next) and U (Until), from which ½ (Eventually), ¼ (Always), and W (Waiting- LTL ¾ 270 I. Balaban, A. Cohen, and A. Pnueli for) can be derived. An LTL formula is a combination of assertions using the boolean operators ¬ and ∧ and the temporal operators: ϕ ::= p | ¬ϕ | ϕ ∨ ϕ | ¾ ϕ | ϕUϕ An LTL formula ϕ is satisfied by computation σ, denoted σ |= ϕ, if ϕ holds at the initial state of σ. An LTL formula ϕ is D-valid, denoted D |= ϕ, if all the computations of an FDS D satisfy ϕ. Every LTL formula ϕ is associated with a temporal tester, an FDS denoted by T [ϕ]. A tester contains a distinguished boolean variable x such that for every computation σ of T [ϕ], for every position j ≥ 0, x[sj ] = 1 ⇐⇒ (σ, j) |= ϕ. This construction is used for model-checking an FDS D in the following manner: • Construct a temporal tester T [¬ϕ] which is initialized with x = 1, i.e. an FDS that comprises just those computations that falsify ϕ. • Form the synchronous parallel composition D ||| T [¬ϕ], i.e. an FDS for which all of its computations are of D and which violate ϕ. • Check feasibility of D ||| T [¬ϕ]. D |= ϕ if and only if D ||| T [¬ϕ] is infeasible. 3 Recursive Programs A program P consists of m+1 modules: P0 , P1 , . . . , Pm , where P0 is the main module, and P1 , . . . , Pm are procedures that may be called from P0 or from other procedures. P0 (in x; out z) P1 (in x; out z) Pm (in x; out z) Each module Pi is presented as a flow-graph with its own set of locations Li = {i0 , i1 , . . . , it }. It must have i0 as its only entry point, it as its only exit, and every other location must be on a path from i0 to it . It is required that the entry node has no incoming edges and that the terminal node has no outgoing edges. The variables of each module Pi are partitioned into y = (x; u; z ). We refer to x, u, and z as the input, working (local), and output variables, respectively. A module cannot modify its own input variables. 3.1 Edge Labels Each edge in the graph is labeled by an instruction that has one of the following forms: • A local change d(y , y  ), where d is an assertion over two copies of the module variables. d(y , y  ) a c (1) e It is required that d(y , y  ) implies x  = x. Ranking Abstraction of Recursive Programs 271 • A procedure call din (y , x2 ); Pj (x2 , z2 ); dout (y , z2 , y  ), where x2 and z2 are fresh copies of the input and output parameters x and z , respectively. din (y , x2 ) a Pj (x2 , z2 ) dout (y , z2 , y  ) c This instruction represents a procedure call to procedure Pj where several elements are non-deterministic. The assertion din (y , x2 ) determines the actual arguments that are fed in the variables of x2 . It may also contain an enabling condition under which this transition is possible. The assertion dout (y , z2 , y  ) updates the module variables  y based on the values returned by the procedure Pj via the output parameters z2 . It is required that dout (y , z2 , y  ) implies x  = x. Unless otherwise stated, we shall use the following description as abbreviation for a procedure call. a din (y , x2 ); Pj (x2 , z2 ); dout (y , z2 , y  ) e c (2) Example 1 (The 91 Function). Consider the functional program specified by F (x) = if x > 100 then x − 10 else F (F (x + 11)) (3) We refer to this function as F91 . Fig. 1 shows the procedural version of F91 . In the figure, as well as subsequent examples, the notation v1 := f (v2 ) denotes v1 = f (v2 ) ∧ pres(y − v2 ), with pres(v ) defined as v  = v , for some set of variables v . x > 100 ∧ (z := x − 10) 0 x ≤ 100 ∧ x2 = x + 11; P (x2 , z2 ); u := z2 ; 2 x2 = u; P (x2 , z2 ); z := z2 ; 1 Fig. 1. Procedural program F91 3.2 Computations A computation of a program P is a maximal (possibly infinite) sequence of states and their labeled transitions: λ1 λ2  ⊥)  −→ σ : 00 ; (ξ, ⊥, 1 ; ν1  −→ 2 ; ν2  · · ·  where each νi = (ξi , ηi , ζi ) is an interpretation of the variables (x, u, z ). The values ⊥ denote uninitialized values. Labels in the transitions are either names of edges in the λ program or the special label return. Each transition ; ν  −→  ; ν   in a computation must be justified by one of the following cases: Assignment: There exists an assignment edge e of the form presented in Diagram (1), such that  = a , λ = e,  = c and ν , ν   |= d(y , y  ). 272 I. Balaban, A. Cohen, and A. Pnueli Procedure Call: There exists a call edge e of the form presented in Diagram (2), such  ⊥),  where ν , ξ   |= din (y , x2 ). that  = a , λ = e,  = j0 , and ν  = (ξ  , ⊥, Return: There exists a procedure Pj (the procedure from which we return), such that  = jt (the terminal location of Pj ). The run leading up to ; ν  has a suffix of the form λk λ1 λ2  ⊥)  −→ j ; (ξ; ⊥; · · · −→ ; (ξ; η; ζ) 1 ; ν1  −→   0 σ1 such that the segment σ1 is balanced (has an equal number of call and return labels), λ1 = e is a call edge of the form presented in Diagram (2), where  = c , λ = return, and ν1 , ζ, ν   |= dout (y , z2 , y  ). This definition uses the computation itself in order to retrieve the context as it were before the corresponding call to procedure Pj . λ1 λk  ⊥)  −→ For a run σ1 : 00 ; (ξ, ⊥, · · · −→ ; ν , we define the level of state ; ν , denoted Lev(; ν ), to be the number of “call” edges in σ1 minus the number of “return” edges. 4 Verifying Termination This section presents a method for verifying termination of procedural programs. Initially, the system is augmented with well-founded ranking components. Then a finitary state abstraction is applied, resulting in a finite-state procedural program. Procedure summaries are computed over the abstract, finite-state program, and a procedure-free FDS is constructed. Finally, infeasibility of the derived FDS is checked, showing that it does not possess a fair divergent computation. This establishes the termination of the original program. 4.1 A Proof Rule for Termination The application of a ranking abstraction to procedures is based on a rule for proving termination of loop-free procedural programs. We choose a well founded domain (D, ), such that for each procedure Pi with input parameters x, we associate a ranking function δi that maps x to D. For each edge e in Pi , labeled by a procedure call as shown in Diagram (2), we generate the descent condition De (y ) : din (y , x2 ) → δi (x) δj (x2 ). The soundness of this proof rule is stated by the following claim: Claim 1 (Termination). If the descent condition De (y ) is valid for every procedure call edge e in a loop-free procedural program P , then P terminates. Proof: (Sketch) A non-terminating computation of a loop-free program must contain a subsequence of the form  ⊥),  . . . , 0 ; (ξ0 , η0 , ζ0 ), j1 ; (ξ1 , ⊥,  ⊥),  . . . , j1 ; (ξ1 , η1 , ζ1 ), 00 ; (ξ0 , ⊥, i0 0 i1  ⊥),  . . . , j2 ; (ξ2 , η2 , ζ2 ), j3 ; (ξ3 , ⊥,  ⊥),  ... j2 ; (ξ2 , ⊥, 0 i2 0  ⊥))  Lev(j0k ; (ξk , ⊥, = Lev(jikk ; (ξk , ηk , ζk )) = k. If the where, for each k ≥ 0, descent condition is valid for all call edges, this leads to the existence of the infinitely descending sequence Ranking Abstraction of Recursive Programs δ0 (ξ0 ) δj1 (ξ1 ) δj2 (ξ2 ) δj3 (ξ3 ) 273 ··· which contradicts the well-foundedness of the δi ’s. Space limitations disallow a proof of the following completeness result: Claim 2 (Completeness). The method of proving termination is complete for loop-free programs. Validity of the condition De is to be interpreted semantically. Namely, De (y ) should hold for every ν , such that there exists a computation reaching location a with y = ν . 4.2 Ranking Augmentation of Procedural Programs Ranking augmentation was suggested in [KP00] and used in [BPZ05] in conjunction with predicate abstraction to verify liveness properties of non-procedural programs. In its application here we require that a ranking function be applied only over the input parameters. Each procedure is augmented with a ranking observer variable that is updated at every procedure call edge e, in a manner corresponding to the descent condition De . For example, if the observer variable is inc then a call edge din (y , x2 ); Pj (x2 ; z2 ); dout (y , z2 , y  ) is augmented to be din (y , x2 ) ∧ inc = sign(δ(x2 ) − δ(x)); Pj (x2 ; z2 ); dout (y , z2 , y  ) ∧ inc = 0 All local assignments are augmented with the assignment inc := 0, as the ranking does not change locally in a procedure. Well foundedness of the ranking function is captured by the compassion requirement (inc < 0, inc > 0) which is being imposed only at a later stage. Unlike the termination proof rule, the ranking function need not decrease on every call edge. Instead, a program can be augmented with multiple similar components, and it is up to the feasibility analysis to sort out their interaction and relevance automatically. Example 2 (Ranking Augmentation of Program F91 ). We now present an example of ranking abstraction applied to program F91 of Fig. 1. As a ranking component, we take δ(x) = if x > 100 then 0 else 101 − x Fig. 2 presents the program augmented by the variable inc. x > 100 ∧ ((z, inc) := (x − 10, 0)) 0  x ≤ 100 ∧ ((x2 , inc ) = (x + 11, ∆(x, x2 ))); P (x2 , z2 ); (u, inc) := (z2 , 0) 2  (x2 , inc ) = (u, ∆(x, x2 )); P (x2 , z2 ); (z, inc) := (z2 , 0) 1 Fig. 2. Program F91 augmented by a Ranking Observer. The notation ∆(x1 , x2 ) denotes the expression sign(δ(x2 ) − δ(x1 )). 274 I. Balaban, A. Cohen, and A. Pnueli 4.3 Predicate Abstraction of Augmented Procedural Programs We consider the application of finitary abstraction to procedural programs, focusing on predicate abstraction for clarity. We assume a predicate base that is partitioned into  x), W  (y ), R(  x, z)}, with corresponding abstract (boolean) variables b = T = {I( T    {bI , bW , bR }. For each procedure the input parameters, working variables, and output parameters are bI ,bW , and bR , respectively. An abstract procedure will have the same control-flow graph as its concrete counterpart, where only labels along the edges are abstracted as follows: • A local change relation d(y , y  ) is abstracted into the relation D(bT , bT ) : ∃y , y  . bT = T (y ) ∧ bT = T (y  ) ∧ d(y , y  ) • A procedure call din (y, x2 ); Pj (x2 , z2 ); dout (y , z2 , y  ) is abstracted into the abstract procedure call Din (bT , b2I ); Pj (b2I , b2R ); Dout (bT , b2R , bT ), where Din (bT , b2I )  x2 ) ∧ din ( : ∃ y,  x2 . bT = T ( y ) ∧ b2I = I( y,  x2 ) 2     = T ( y ) ∧ b = R( x , z2 ) ∧ bT = T ( y  )∧ b 2  T R Dout (bT , b2R , bT ) : ∃ y,  x2 ,  z2 ,  y  y,  x2 ) ∧ dout ( y,  z2 ,  y ) din (     Example 3 (Abstraction of Program F91 ). We apply predicate abstraction to program F91 of Fig. 1. As a predicate base, we take I : {x > 100},  : {u = g(x + 11)}, W  : {z = g(x)} R where g(x) = if x > 100 then x − 10 else 91 The abstract domain consists of the corresponding boolean variables {BI , BW , BR }. The abstraction yields the abstract procedural program P (BI , BR ) which is presented in Fig. 3. BI ∧ (BR := 1) 0 2 2 ¬BI ; P (BI2 , BR ); BW := BR 2 2 2 P (BI2 , BR ); BR := ¬BW ∨ BI = BR 1 Fig. 3. An abstract version of Program F91 Finally we demonstrate the joint (predicate and ranking) abstraction of program F91 . Example 4 (Abstraction of Ranking-Augmented Program F91 ). We wish to abstract the augmented program from Example 2. When applying the abstraction based on the predicate set I : {x > 100},  : {u = g(x + 11)}, W  : {z = g(x)} R Ranking Abstraction of Recursive Programs BI ∧ ((BR , inc) := (1, 0)) 0 ¬BI ∧ (BI2 , inc ) = (?, −1); 2 P (BI2 , BR ); 2 , 0) (BW , inc) := (BR 275 2 (BI2 , inc ) = (?, f (BI , BW , BI2 )); 2 P (BI2 , BR ); 2 , 0) (BR , inc) := (¬BW ∨ BI = BR 1 Fig. 4. An abstract version of Program F91 augmented by a Ranking Observer we obtain the abstract program presented in Fig. 4, where f (BI , BW , BI2 ) = if ¬BI ∧ (BI2 ∨ ¬BI2 ∧ BW ) then −1 then 0 else if BI ∧ BI2 else 1 Note that some (in fact, all) of the input arguments in the recursive calls are left non-deterministically 0 or 1. In addition, on return from the second recursive call, it is necessary to augment the transition with an adjusting assignment that correctly updates the local abstract variables based on the returned result. It is interesting to observe that all terminating calls to this abstract procedure return BR = 1, thus providing an independent proof that program F91 is partially correct with respect to the specification z = g(x). The analysis of this abstract program yields that ¬BI ∧BW is an invariant at location 1. Therefore, the value of f (BI , BW , BI2 ) on the transition departing from location 1 will always be −1. Thus, it so happens that even without feasibility analysis, from Claim 1 we can conclude that the program terminates. 4.4 Summaries A procedure summary is a relation between input and output parameters. A relation q(x, z) is a summary if it holds for any x and z iff there exists a run in which the procedure is called and returns, such that the input parameters are assigned x and on return the output parameters are assigned z . Since procedures may contain calls (recursive or not) to other procedures, deriving summaries involves a fixpoint computation. An inductive assertion network is generated that defines, for each procedure Pj , a summary q j and an assertion ϕja associated with each location a . For each procedure we construct a set of constraints according to the rules of Table 1. The constraint ϕjt (x, u, z ) → q j (x, z) derives the summary from the assertion associated with the terminating location of Pj . All assertions, beside ϕj0 , are initialized false. ϕj0 , which refers to the entry location of Pj , is initialized true, i.e. it allows the input variables to have any possible value at the entry location of procedure Pj . Note that the matching constraint for an edge labeled with a call to procedure Pi (x2 ; z2 ) encloses the summary of that procedure, i.e. the summary computation of one procedure comprises summaries of procedures being called from it. An iterative process is performed over the constraints contributed by all procedures in the program, until a fixpoint is reached. Reaching a fixpoint is guaranteed since all variables are of finite type. 276 I. Balaban, A. Cohen, and A. Pnueli Table 1. Rules for Constraints contributed by Procedure Pj to the Inductive Assertion Network Fact d(y , y  ) a a din (y , x2 ) a Pi (x2 ; z2 ) a Constraint(s) ϕj0 = true j ϕt ( x,  u, z) → q j ( x,  z) c ϕja ( y ) ∧ d( y, y  ) → ϕjc ( y ) c ϕja ( y) ∧ din ( y, x2 ) → ϕjc ( y,  x2 ) c ϕja (y, x2 ) ∧ q i (x2 , z2 ) → ϕjc (y, z2 )  dout (y , z2 , y ) c ϕja (y, z2 ) ∧ dout (y, z2 , y  ) → ϕjc (y  ) Claim 3 (Soundness). Given a minimal solution to the constraints of Table 1, qj is a summary of Pj , for each procedure Pj . Proof. In one direction, let σ : s0 , . . . , st be a computation segment starting at location j0 and ending at jt , such that x[s0 ] = v1 and z[st ] = v2 . It is easy to show by induction on the length of σ that st |= ϕjt (x, u, z ). From Table 1 we obtain ϕjt (x, u, z ) → q j (x, z). Therefore st |= q j (x, z). Since all edges satisfy x = x  , we obtain [x → v1 , y → v2 ] |= q j (x, y ). In the other direction, assume [x → v1 , y → v2 ] |= q j (x, y). From the constraints in Table 1 and the minimality of their solution, there exists a state st with x[st ] = v1 and z[st ] = v2 such that st |= ϕjt . Repeating this reasoning we can, by propagating backward, construct a computation segment starting at 0 that initially assigns v1 to x. 4.5 Deriving a Procedure-Free FDS Using summaries of an abstract procedural program PA , one can construct the derived FDS of PA , labeled derive(PA ). This is an FDS denoting the set of reduced computations of PA , a notion formalized in this section. The variables of derive(PA ) are partitioned into x, y , and z, each of which consists of the input, working, and output variables of all procedures, respectively. The FDS is constructed as follows: – Edges labeled by local changes in PA are preserved in derive(PA ) – A procedure call in PA , denoted by a sequence of edges of the form din (y , x2 ); Pj (x2 , z2 ); dout (y , z2 , y  ) from a location a to a location c , is transformed into the following edges: • A summary edge, specified by a ∃ x2 ,  z2 .din ( y,  x2 ) ∧ q j ( x2 ,  z2 ) ∧ dout ( y,  z2 ,  y ) e c • A call edge, specified by a din ( y,  x ) j0 e – All compassion requirements, which are contributed by the ranking augmentation and described in Subsection 4.2, are imposed on derive(PA ). Ranking Abstraction of Recursive Programs 277 The reasoning leading to this construction is that summary edges represent procedure calls that return, while call edges model non-returning procedure calls. Therefore, a summary edge leads to the next location in the calling procedure while modifying its variables according to the summary. On the other hand, a call edge connects a calling location to the entry location of the procedure that is being called. Thus, a nonterminating computation consists of infinitely many call edges, and a call stack is not necessary. We now prove soundness of the construction. Recall the definition of a computation of a procedural program given in Subsection 3.2. A computation can be terminating or non-terminating. A terminating computation is finite, and has the property that every computation segment can be extended to a balanced segment, which starts with a calling step and ends with a matching return step. A computation segment is maximally balanced if it is balanced and is not properly contained in any other balanced segment. Definition 1. Let σ be a computation of PA. Then the reduction of σ, labeled reduce(σ), is a sequence of states obtained from σ by replacing each maximal balanced segment by a summary-edge traversal step. Claim 4. For any sequence of states σ, σ is a computation of derive(PA ) iff there exists σ  , a computation of PA , such that reduce(σ  ) = σ. Proof of the claim follows from construction of derive(PA ) in a straightforward manner. It follows that if σ is a terminating computation of PA , then reduce(PA ) consists of a single summary step in the part of derive(PA ) corresponding to P0 . If σ is an infinite computation of PA , then reduce(σ) (which must also be infinite) consists of all assignment steps and calls into procedures from which σ has not returned. Claim 5 (Soundness – Termination). If derive(PA ) is infeasible then PA is a terminating program. Proof. Let us define the notion of abstraction of computations. Let σ = s0 , s1 , . . . be a computation of P , the original procedural program from which PA was abstracted. The abstraction of σ is a computation α(s0 ), α(s1 ), . . . where for all i ≥ 0, if si is a state in  x), bW → W  (y ), bR → R(  x, z)]. σ, then α(si ) = [bI → I( Assume that derive(PA ) is infeasible. Namely, every infinite run of derive(PA ) violates a compassion requirement. Suppose that P has an infinite computation σ. Consider reduce(σ) which consists of all steps in non-terminating procedure invocations within σ. Since the abstraction of reduce(σ) is a computation of derive(PA ) it must be unfair with respect to some compassion requirement. It follows that a ranking function keeps decreasing over steps in reduce(σ) and never increases – a contradiction. 4.6 Analysis The feasibility of derive(PA ) can be checked by conventional symbolic model-checking techniques. If it is feasible then there are two possibilities: (1) The original system truly diverges, or (2) feasibility of the derived system is spurious, that is, state and ranking abstractions have admitted behaviors that were not originally present. In the latter case, the method presented here can be repeated with a refinement of either state or ranking abstractions. The precise nature of such refinement is outside the scope of this paper. 278 I. Balaban, A. Cohen, and A. Pnueli 5 LTL Model Checking In this section we generalize the method discussed so far to general LTL model-checking. To this end we adapt to procedural programs the method discussed in Subsection 2.2 for model-checking LTL by composition with temporal testers [KPR98]. We prepend the steps of the method in Section 4 with a tester composition step relative to an LTL property. Once ranking augmentation, abstraction, summarization, and construction of the derived FDS are computed, the resulting system is model-checked by conventional means as to feasibility of initial states that do not satisfy the property. The main issue is that synchronous composition of a procedural program with a global tester, including justice requirements, needs to be expressed in terms of local changes to procedure variables. In addition, since LTL is modeled over infinite sequences, the derived FDS needs to be extended with idling transitions. 5.1 Composition with Temporal Testers A temporal tester is defined by a unique global variable, here labeled t, a transition relation ρ(z, t, z  , t )1 over primed and unprimed copies of the tester and program variables, where t does not appear in z, and a justice requirement. In order to simulate global composition with ρ, we augment every procedure with the input and output parameters ti and to , respectively, as follows: – An edge labeled by a local change is augmented with ρ(z , to , z  , to ) – A procedure call of the form din (y , x2 ); Pj (x2 , z2 ); dout (y , x2 , y  ) is augmented to be din (y , x2 ) ∧ ρ(z , to , x2 , t2i ); Pj ((x2 , t2i ), (z2 , t2o )); dout ∧ ρ(z2 , t2o , z  , to ) – Any edge leaving the initial location of a procedure is augmented with to = ti Example 5. Consider the program in Fig. 5. Since this program does not terminate, we are interested in verifying the property ϕ : (½ z) ∨ ¼ ½ at − 2 , specifying that either eventually a state with z = 1 is reached, or infinitely often location 2 of P1 is visited. To verify ϕ we decompose its negation into its principally temporal subformulas, ¼ ¬z and ½ ¼ ¬at − 2 , and compose the system with their temporal testers. Here we demonstrate the composition with T [ ¼ ¬z], given by the transition relation t = ¬z ∧ t and the trivial justice requirement true. The composition is shown in Fig. 6. As a side remark, we note that our method can handle global variables in the same way as applied for global variables of testers, i.e., represent every global variable by a set of input and output parameters and augment every procedure with these parameters and with the corresponding transition relations. 5.2 Observing Justice In order to observe justice imposed by a temporal tester, each procedure is augmented by a pair of observer variables that consists of a working and an output variables. Let J be a justice requirement, Pi be a procedure, and the associated observer variables be Ju and Jo . Pi is augmented as follows: On initialization, both Ju and Jo are assigned 1 We assume here that the property to be verified is defined over the output variable only. Ranking Abstraction of Recursive Programs P0 (x; z): 279 P1 (x; z): x = 0 ∧ z := 1 0 main 0 init init 1 x2 = x − 1; P1 (x2 ; z2 ); z := z2 1 3 x2 = x + 1; P1 (x2 ; z2 ); z := z2 2 Fig. 5. A Divergent Program. init represents x > 0 ∧ z := 0, and main represents x ≥ 0 ∧ x2 := x; P1 (x2 ; z2 ); z := z2 . P0 (x, ti ; z, to ): P1 (x, ti ; z, to ): x = 0 ∧ ti = to ∧ (z, to ) := (1, 0) 0 main 0 1 init init 1 2 (x2 = x − 1) ∧ to = ti2 ; P1 (x2 , ti2 ; z2 , to2 ); dout 3 (x2 = x + 1) ∧ to = ti2 ; P1 (x2 , ti2 ; z2 , to2 ); dout Fig. 6. The Program of Fig. 5, Composed with T [ ¼ ¬z]. The assertion dout represents to2 = (¬z2 ∧ to ) ∧ z := z2 , init represents x > 0 ∧ ti = to ∧ z := 0, and main represents x ≥ 0 ∧ (x2 = x) ∧ (to = ¬z ∧ ti2 ); P1 (x2 , ti2 ; z2 , to2 ); dout . true if the property J holds at that state. Local changes are conjoined with Ju := J  and Jo := (Jo ∨ J  ). Procedure calls are conjoined with Ju := (J  ∨ Jo2 ) and Jo := (Jo ∨ J  ∨ Jo2 ), where Jo2 is the relevant output observer variable of the procedure being called. While Ju observes J at every location, once Jo becomes true it remains so up to the terminal location. Since Jo participates in the procedure summary, it is used to denote whether justice has been satisfied within the called procedure. 5.3 The Derived FDS We use the basic construction here in deriving the FDS as in Section 4.5. In addition, for every non-output observer variable Ju we impose the justice requirement that in any fair computation, Ju must be true infinitely often. Since LTL is modeled over infinite sequences, we must also ensure that terminating computations of the procedural program are represented by infinite sequences. To this end we simply extend the terminal location of procedure P0 with a self-looping edge. Thus, a terminating computation is one that eventually reaches the terminal location of P0 and stays idle henceforth. 280 I. Balaban, A. Cohen, and A. Pnueli In this section we use the notation derive(PA ) to denote the FDS that is derived from PA and thus extended. The following claim of soundness is presented without proof due to space limitations. Claim 6 (Soundness – LTL). Let P be a procedural program, ϕ be a formula whose principal operator is temporal, and PA be the abstract program resulting from the composition of P with the temporal tester T [¬ϕ] and its abstraction relative to a state and ranking abstraction. Let to be the tester variable of T [¬ϕ]. If to = true is an infeasible initial state of derive(PA ) then ϕ is valid over P . 6 Conclusion We have described the integration of ranking abstraction, finitary state abstraction, procedure summarization, and model-checking into a combined method for the automatic verification of LTL properties of infinite-state recursive procedural programs. Our approach is novel in that it reduces the verification problem of procedural programs with unbounded recursion to that of symbolic model-checking. Furthermore, it allows for application of ranking and state abstractions while still relegating all summarization computation to the model-checker. Another advantage is that fairness is being supported directly by the model, rather than being specified in a property. We have implemented a prototype based on the TLV symbolic model-checker and tested several examples such as Ackerman’s function, the Factorial function and a recursive formulation of the 91 function. We verified that they all terminate and model checked satisfiability of several LTL properties. As further work it would be interesting to investigate concurrency with bounded context switching as suggested in [RQ05]. Another direction is the exploration of different versions of LTL that can relate to nesting levels of procedure calls, similar to the manner in which the CARET logic [AEM04] expresses properties of recursive state machines concerning the call stack. References [ABE+ 05] [AEM04] [ACEM05] [AEY01] [BPZ05] [BR00] [GS97] R. Alur, M. Benedikt, K. Etessami, P. Godefroid, T.W. Reps, and M. Yannakakis. Analysis of recursive state machines. ACM Trans. Program. Lang. Syst., 27(4):786– 818, 2005. R. Alur, K. Etessami, and P. Madhusudan. A temporal logic of nested calls and returns. In TACAS’04, pages 467–481. R. Alur, S. Chaudhuri, K. Etessami, and P. Madhusudan. On-the-fly reachability and cycle detection for recursive state machines. In TACAS’05, pages 61–765. R. Alur, K. Etessami, and M. Yannakakis. Analysis of recursive state machines. In CAV’01, pages 207–220. I. Balaban, A. Pnueli, and L.D. Zuck. Shape analysis by predicate abstraction. In VMCAI’05, pages 164–180. T. Ball and S.K. Rajamani. Bebop: A symbolic model checker for boolean programs. In SPIN’00, pages 113–130. S. Graf and H. Saı̈di. Construction of abstract state graphs with PVS. In CAV’97, pages 72–83. Ranking Abstraction of Recursive Programs [KP00] [KPR98] [PSW05] [RQ05] [Sha00] [SP81] 281 Y. Kesten and A. Pnueli. Verification by augmented finitary abstraction. Information and Computation, 163(1):203–243, 2000. Y. Kesten, A. Pnueli, and L. Raviv. Algorithmic verification of linear temporal logic specifications. In CAV’98, pages 1–16. A. Podelski, I. Schaefer, and S. Wagner. Summaries for while programs with recursion. In ESOP’05, pages 94–107. J. Rehof and S. Qadeer. Context-bounded model checking of concurrent software. In TACAS’05, pages 93–107. E. Shahar. The TLV Manual, 2000. http://www.cs.nyu.edu/acsys/tlv. M. Sharir and A. Pnueli. Two approaches to inter-procedural data-flow analysis. In Jones and Muchnik, editors, Program Flow Analysis: Theory and Applications. Prentice-Hall, 1981. Relative Safety Joxan Jaffar, Andrew E. Santosa, and Răzvan Voicu School of Computing, National University of Singapore, S16, 3 Science Drive 2, Singapore 117543, Republic of Singapore {joxan, andrews, razvan}@comp.nus.edu.sg Abstract. A safety property restricts the set of reachable states. In this paper, we introduce a notion of relative safety which states that certain program states are reachable provided certain other states are. A key, but not exclusive, application of this method is in representing symmetry in a program. Here, we show that relative safety generalizes the programs that are presently accommodated by existing methods for symmetry. Finally, we provide a practical algorithm for proving relative safety. 1 Introduction A safety property restricts the set of reachable states. Let [[P]] denote the collecting semantics of a program P with variables X̃. Thus each sequence x̃ of variable values in [[P]] represents a reachable state. A safety property may be simply written as a constraint Ψ over the variables X̃. For example, the safety property X + Y < 9 states that in all reachable states, the values of the program variables X and Y sum to less than 9. If we let the predicate p(x̃) be true just in case the sequence of values of program variables x̃ is in [[P]], then a safety property may be written in the form p(X̃) |= Ψ, for example, p(X,Y ) |= X + Y < 9. In this paper, we introduce the notion of relative safety. Briefly and informally, this asserts that a certain state is reachable provided a certain other state is reachable. Note that this does not mean that these two states share a computation path. Specifically, consider the specification of states in the form p(X̃) ∧ Ψ. That is, we use the constraint Ψ to identify the set of solutions of Ψ which correspond to reachable states. Then our notion of relative safety simply relates two of these specifications in the following way: p(X̃) ∧ Ψ |= p(Ỹ ) ∧ Ψ where Ψ and Ψ are constraints over X̃, Ỹ . For example, p(X1 , X2 ) |= p(Y1 ,Y2 ) ∧ X1 = Y2 ∧ X2 = Y1 (or more succinctly, p(X1 , X2 ) |= p(X2 , X1 )) asserts that if the state (α, β) is reachable, then so is (β, α), for all values α and β. In other words, the observable values of the two program variables commute. Relative safety can specify powerful structural properties of programs. The driving application we consider in this paper is that of verification with symmetry reduction. Symmetry has been widely employed for minimizing the search space in program verification. It is a reduction technique employed in Murϕ [13] and SMC [21] model checkers among many others. Symmetry is often defined using automorphisms π on the symmetric objects. These induce an equivalence relation between program states. Efficiency in state exploration is hence achieved by only checking the representatives of the equivalence classes. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 282–297, 2006. c Springer-Verlag Berlin Heidelberg 2006  Relative Safety 283 Let us take as an example a concurrent program with two almost identical processes, where process 1 updates the variables PC1 and X1 , and process 2, PC2 and X2 . Here PC1 and PC2 are process 1 and 2’s program counters, respectively. Let us consider (α, β, γ, δ) to be values of (PC1 , PC2 , X1 , X2 ). Classic symmetry “exchanges” process 1 and 2, that is, π((α, β, γ, δ)) = (β, α, δ, γ). A necessary condition for π to be an automorphism is that whenever x̃ is a reachable state, so is π(x̃). Such a relation between x̃ and π(x̃) can be logically represented as the relative safety assertion p(PC1 , PC2 , X1 , X2 ) |= p(PC2 , PC1 , X2 , X1 ) where the predicate p, once again, represents the reachable states of the program. Below we show many more examples of symmetry, including ones that are not covered by existing techniques. The main technical part of this paper is a proof method. In its most basic form, the method to prove the assertion G1 |= G2 checks that the set of states represented by the symbolic formula G2 is reachable, whenever the set G1 is reachable. This is done by the basic operation of “backward unfolding” the program’s transition relation. A key element in our algorithm is the use of the principle of coinduction which is critical for termination of the unfolding process. The paper is organized as follows. We discuss some related work in Section 2. We then formalize the program semantics and the proof method in the framework of Constraint Logic Programming (CLP) [14], for two main reasons. First, the logical framework of CLP is eminently suitable for the represntation of our concept of relative safety, and second, the established implementation technology of CLP systems allow us to perform unfolding operations efficiently. We introduce some preliminary CLP concepts in Section 3. Relative safety is then formally defined in Section 4. Here, we show via several examples, novel ways to realize symmetry. In addition to these, we will also show a non-symmetry example. Section 5 formally presents our algorithm. Finally, in Section 6, we demonstrate the use of our prototype implementation on some classes of programs in order to show the practical potential of our algorithm. 2 Related Work Existing approaches define symmetry on syntactic considerations. In contrast, our notion of relative safety is based on semantics. An advantage is more flexibility in specifying a wide range of symmetry-like properties, including many that would not be considered a symmetry property by the existing methods. One example, shown later, is a mutual exclusion algorithm with priority between processes. We can handle a wider range than [7, 20], for example. Importantly, relative safety goes far beyond symmetry (and below, we demonstrate the property of serializability). In more detail, symmetry is often defined as a transition-preserving equivalence [8, 3, 13, 9, 20], where an automorphism π, other than being a bijection on the reachable states, also satisfies that (x̃, x̃ ) is a transition iff (π(x̃), π(x̃ )) is. Another notion of equivalence used is bisimilarity [7], replacing the second condition with bisimilarity on the state graph. These stronger equivalences allows for the handling of larger class of properties beyond safety such as CTL∗ properties. However, stronger equivalence also means less freedom in handling symmetries on the collecting semantics, which we exploit further in this paper. 284 J. Jaffar, A.E. Santosa, and R. Voicu In [20], while still defining symmetry as transition-preserving equivalence, they attempt to handle systems which state graphs are not fully symmetric. The approach transforms the state graph into a fully symmetric one, while keeping annotation for each transition that has no correspondence in the original state graph. The graph with full symmetry is then reduced by equating automorphic states. This work is the most general and can reduce the state graph of even totally asymmetric programs, however, its application is limited to programs with syntacticly specified static transition priority. Similar to the work of [20], prior works infer symmetry based on syntactic conditions, such as concurrent program with identical processes or syntactic restrictions on program statements and variable usage. These also include the scalarset approach of Murφ [13], and the limitation to permutation of process identifiers in SMC model checker [21]. In contrast, our approach to prove symmetry semanticly for each program enables us to treat more programs where the semantics is symmetric although the syntax is not. An application of our symmetry proof method has been demonstrated in the context of timed automata verification [16]. This paper presents a generalization and automation of the method. There have been many works in the area of verification using CLP (see [11] for a non-exhaustive survey), partly because it is natural to express transition relations as CLP rules. Due to its ability in handling constraints, CLP has been notably used in verification of infinite-state systems [5, 10, 12, 17], although results for finite-state systems are also available [18, 19]. None of these works, however, deal with relative safety. 3 CLP Representation of Programs We start by stipulating that each process in a concurrent program has the usual syntax of a deterministic imperative language, and communication occurs via shared variables. We also have a blocking primitive await (b) s where b is a boolean expression and s a program statement, which can be executed only when b holds. A program is a collection of a fixed number of processes. We provide the 2-process bakery algorithm in Figure 1 as an example. We display program points in angle brackets. We now introduce CLP programs. CLP programs have a universe of discourse D which is a set of terms, integers, and arrays of integers. A constraint is written using a language of functions and relations. They are used in two ways: in the base programming language to describe expressions and conditionals, and in user assertions, defined below. In this paper, we will not define the constraint language explicitly, but invent them on demand in accordance with our examples. Thus the terms of our CLP programs include the function symbols of the constraint language. 0 1 2 while (true) do t1 := t2 + 1 await (t1<t2 ∨ t2=0) skip t1 := 0 end 0 1 2 while (true) do t2 := t1 + 1 await (t2<t1 ∨ t1=0) skip t2 := 0 end Fig. 1. Bakery-2 Relative Safety p([0, 0], T1 , T2 )←T1 = 0, T2 = 0. p([1, P2 ], T1 , T2 )←p([0, P2 ], T1 , T2 ), T1 = T2 + 1. p([2, P2 ], T1 , T2 )←p([1, P2 ], T1 , T2 ), (T1 < T2 ∨T2 = 0). p([0, P2 ], T1 , T2 )←p([2, P2 ], T1 , T2 ), T1 = 0. p([P1 , 1], T1 , T2 )←p([P1 , 0], T1 , T2 ), T2 = T1 + 1. p([P1 , 2], T1 , T2 )←p([P1 , 1], T1 , T2 ), (T2 < T1 ∨T1 = 0). p([P1 , 0], T1 , T2 )←p([P1 , 2], T1 , T2 ), T2 = 0. 285 % init % r1 % e1 % x1 % r2 % e2 % x2 Fig. 2. CLP Representation of Bakery-2 An atom, is as usual, of the form p(t˜) where p is a user-defined predicate symbol and ˜ where p ranges over the predicates and d˜ ranges the t˜ a tuple of terms. The set {p(d)} over the tuples in D is called the domain base B of our CLP programs. Now, a CLP program is a set of rules. A rule is an implication of the form A ← B̃, φ where the atom A is the head of the rule, and the sequence of atoms B̃ and the constraint φ constitute the body of the rule. We say that a rule is a (constrained) fact if B̃ is the empty sequence. Translating a user program P0 into an appropriate CLP program P is in fact intuitively straightforward; we thus provide only an informal outline here. Our CLP rules corresponding to a transition of the program will be of the form p(PC , X1 , X2 , . . . , Xn ) ← p(PC, X1 , X2 , . . . , Xn ), φ. Here, PC is a list representing the program counters in the k processes of P0 before the transition. Its primed counterpart PC represents the list after the transition. X1 , X2 , . . . , Xn and their primed counterparts represent the variables in P0 before and after the transition, while φ is a constraint on all the variables. Note that as in the above rule, throughout this paper we often use a comma in place of ∧ to denote conjunction. The above rule depicts a transition from rhs to lhs . Example 1 (Bakery-2). Consider our 2-process bakery algorithm in Figure 1. Note that the point 2 indicates the critical section, and initially, t1 = t2 = 0. The CLP program in Figure 2 (the parts preceded by % are comments) is in fact its CLP representation. x The semantics of a CLP program is based on the concept of ground instances. A ground instance of a constraint φ is obtained by instantiating the variables therein from D , and the result is true or false. We write this as φσ [14] where σ : var(φ) → D a grounding. Similarly, a ground instance of an atom or rule is obtained by instantiating variables therein with values in B using a grounding σ. Now consider the fixpoint operator TP : 2B → 2B for a CLP program P defined as follows: a ground atom Aσ is in TP (S) if Aσ ∈ S or there is a ground instance (A ← B̃, φ)σ of a rule A ← B̃, φ in P such that B̃σ ⊆ S and φσ is true. A basic theorem of CLP is that the least fixpoint of TP is the least model of P, and this is also equal to the set of ground atoms. We denote this set by [[P]]. A ground instance Aσ is true iff Aσ ∈ [[P]]. Similarly, a ground instance (B̃, φ)σ of a goal is true iff B̃σ ⊆ [[P]] and φσ is true. We denote the set of true ground instances of a goal G by [[G]]. In general, where P is the CLP representation of P0 , we have that the collecting semantics of P0 is characterized by [[P]]. 286 J. Jaffar, A.E. Santosa, and R. Voicu 4 Relative Safety We now present an assertion language to express relative safety property, and demonstrate its expressive power for program reasoning. We start with a definition of a constraint state. Definition 1 (Constraint State). A constraint state is a goal in the form p(PC, X1 , . . . , Xn ), φ where PC, X1 , . . . , Xn represent the list of program counters and program variables, and φ is a constraint on the variables. Now let GL be a constraint state and GR either constraint or constraint state. Let X̃ = var(GL ) ∪ var(GR ). Definition 2 (Relative Safety). A relative safety assertion is of the form GL |= GR . Its meaning is ∀X̃ : GL → GR that is, for each grounding σ such that GL σ ∈ [[GL ]], GR σ ∈ [[GR ]]. Intuitively, a relative safety assertion specifies that certain states are reachable only if certain other states are. Here we start with a traditional safety property, generally of the form: p(PC, X̃), φ |= φ where φ and φ are constraints on the program counter array PC and program variables X̃. For example, in the Bakery-2 program, the following assertions specify mutual exclusion. p([P1 , P2 ], T1 , T2 ) |= P1 = 2 ∧ P2 = 2, or, p([2, 2], T1 , T2 ) |= false Now consider a relative safety assertion, stating symmetry for Bakery-2: p([P1 , P2 ], T1 , T2 ) |= p([P2 , P1 ], T2 , T1 ). Note that an automorphism must be included in a group with the composition of automorphisms as its operator [23]. Such a group is known as an automorphism group. Our idea is to use a set of relative safety assertions to specify possible automorphisms on reachable states. Note that a single relative safety assertion in general only describes a partial mapping, while an automorphism is total. In general we need a set of assertions to describe a total mapping π. Moreover, equivalence between states is obtained by also proving a complete set of assertions which represent the mappings in an automorphism group. This would include inverses, which proof is often straightforward. Suppose that map(GL |= GR ) is the mapping represented by the assertion GL |= GR . Now, as an example, the above symmetry assertion for Bakery-2 characterizes an automorphism group Aut on the collecting semantics as follows: – We include the obvious map(p([P1 , P2 ], T1 , T2 ) |= p([P1 , P2 ], T1 , T2 )) in Aut satisfying the existence of identity. – By simple renaming {P1 → P2 , P2 → P1 , T1 → T2 , T2 → T1 } on the above assertion, the reverse map(p([P2 , P1 ], T2 , T1 ) |= p([P1 , P2 ], T1 , T2 )) is in Aut satisfying the existence of inverse. Relative Safety 287 – It is straightforward to show that if map(G1 |= G2 ) ∈ Aut and map(G2 |= G3 ) ∈ Aut then map(G1 |= G3 ) ∈ Aut. We will prove the assertion later in Section 5. We now proceed with several examples. Example 2 (Rotational Symmetry). Next we demonstrate rotational symmetry in the solution of N dining philosophers’ problem using N − 1 tickets. For simplicity, we assume there are N=3 philosophers having ids 1, 2 and 3, and there are 3 forks represented as boolean array f, where f[1], f[2], f[3] are forks between philosopher 3 and 1, 1 and 2, and 2 and 3, respectively. Initially the ticket number t=2. To save space, we do not show the actual code. For our purpose it is suffice to demonstrate the rotational symmetry as the assertion: p([P1 , P2 , P3 ], F1 , F2 , F3 , T ) |= p([P3 , P1 , P2 ], F3 , F1 , F2 , T ), where Pi denotes the program point of philosopher i, F1 , F2 and F3 are the values of f[i], 1 ≤ i ≤ 3, respectively, and T is the number of tickets left. The above assertion specifies a cyclic shift. For this example, arbitrary transposition does not result in automorphism. Example 3 (Permutation of Variable-Value Pair). In [16] we discussed a timed automata version of Fischer’s algorithm, a timing-based mutual exclusion algorithm. The pseudocode can be found in [1] and is not presented here to save space. The algorithm uses a global variable k whose value is the process identifier of the process that is about to enter the critical section. This is translated into a variable K in our CLP representation (also not shown here). Since the example uses timing, our CLP representation for the 2-process version uses the variables T1 and T 2, denoting the running time of each process. Our symmetry assertion here is p([P1 , P2 ], T1 , T2 , K) |= p([P2 , P1 ], T2 , T1 , K  ), φ, where φ constrains (K, K  ) to (0, 0), (1, 2) or (2, 1). This is called permutation of variable-value pair [20] since it maps the value of a variable onto a new one without exchanging it with another variable. This is not covered by some previous approaches such as [13, 21]. Example 4 (Priority Mutual Exclusion). We can also express the kind of “approximate” symmetry, as exemplified by the simple 2-process priority mutual exclusion in Figure 3. Each process has 2 as the critical section. Initially, the values of both x1 and x2 are 0. We show the CLP representation in Figure 4. This example is semantically similar to the 0 1 2 while (true) do await (x2 = 0) x1 := 1 skip x1 := 0 end 0 1 2 while (true) do x2 := 1 await (x1 = 0) skip x2 := 0 end Fig. 3. Priority Mutual Exclusion 288 J. Jaffar, A.E. Santosa, and R. Voicu p([0, 0], 0, 0). p([1, P2 ], 1, X2 )←p([0, P2 ], X1 , X2 ), X2 = 0. p([2, P2 ], X1 , X2 )←p([1, P2 ], X1 , X2 ). p([0, P2 ], 0, X2 )←p([2, P2 ], X1 , X2 ). p([P1 , 1], X1 , 1)←p([P1 , 0], X1 , X2 ). p([P1 , 2], X1 , X2 )←p([P1 , 1], X1 , X2 ), X1 = 0. p([P1 , 0], X1 , 0)←p([P1 , 2], X1 , X2 ). Fig. 4. CLP Representation of Priority Mutual Exclusion asymmetric readers-writers in [6] and the priority mutual exclusion in [20]. Although the state graph of the program is not symmetric, the state space, i.e. the set of nodes in the state graph, is, and knowing this is already useful to prove safety properties such as mutual exclusion. We can represent the symmetry on the state space simply as: p([P1 , P2 ], X1 , X2 ) |= p([P2 , P1 ], X2 , X1 ). It is not immediately obvious that the program is symmetric based on syntactic observation alone. Example 5 (Szymanski’s Algorithm). Szymanski’s algorithm is a more complex priority-based mutual exclusion algorithm which is commonly encountered in the literature. We show the pseudocode in Figure 5. Its CLP representation is in Figure 6. Roughly speaking, since the algorithm is based on prioritizing Process 1 to enter the critical section 8, it is not possible for Process 2 to be in the critical section while Process 1 is at its trying section. For example, the following does not hold: p([8, 7], X1 , X2 ) |= p([7, 8], X2 , X1 ). It is because the program points [8, 7] are reachable while [7, 8] are not. In other words, there is a grounding for the lhs goal, but no grounding for the rhs goal. Therefore, a simple symmetry assertion such the one given in the bakery algorithm does not hold. However, the following “not-quite” symmetry assertions still hold: p([8, P2 ], X1 , X2 ), P2 < 3 |= p([8, P2 ], X1 , X2 ), P2 > 7 |= p([9, P2 ], X1 , X2 ), P2 = 7 |= p([P1 , P2 ], X1 , X2 ), P1 = 8, P1 = 9 |= 0 1 2 3 4 5 6 7 8 9 while (true) do x1:=1 await(x2<3) skip x1:=3 if (x2=1) do x1:=2 await(x2=4) skip end x1:=4 skip await(x2<2∨x2>3) skip x1:=0 end 0 1 2 3 4 5 6 7 8 9 p([P2 , 8], X2 , X1 ). p([P2 , 8], X2 , X1 ). p([P2 , 9], X2 , X1 ). p([P2 , P1 ], X2 , X1 ). while (true) do x2:=1 await(x1<3) skip x2:=3 if (x1=1) do x2:=2 await(x1=4) skip end x2:=4 await(x1<2) skip skip x2:=0 end Fig. 5. 2-Process Szymanski’s Algorithm Relative Safety p([0, 0], 0, 0). % Initial State % Rules for Process 1 p([1, P2 ], 1, X2 )←p([0, P2 ], X1 , X2 ). p([2, P2 ], X1 , X2 )←p([1, P2 ], X1 , X2 ), X2 < 3. p([3, P2 ], 3, X2 )←p([2, P2 ], X1 , X2 ). p([4, P2 ], X1 , X2 )←p([3, P2 ], X1 , X2 ), X2 = 1. p([5, P2 ], 2, X2 )←p([4, P2 ], X1 , X2 ). p([6, P2 ], X1 , X2 )←p([3, P2 ], X1 , X2 ), X2 = 1. p([6, P2 ], X1 , X2 )←p([5, P2 ], X1 , X2 ). p([7, P2 ], 4, X2 )←p([6, P2 ], X1 , X2 ). p([8, P2 ], X1 , X2 )←p([7, P2 ], X1 , X2 ). p([9, P2 ], X1 , X2 )←p([8, P2 ], X1 , X2 ), (X2 < 2 ∨ X2 > 3). p([0, P2 ], 0, X2 )←p([9, P2 ], X1 , X2 ). 289 % Rules for Process 2 p([P1 , 1], X1 , 1)←p([P1 , 0], X1 , X2 ). p([P1 , 2], X1 , X2 )←p([P1 , 1], X1 , X2 ), X1 < 3. p([P1 , 3], X1 , 3)←p([P1 , 2], X1 , X2 ). p([P1 , 4], X1 , X2 )←p([P1 , 3], X1 , X2 ), X1 = 1. p([P1 , 5], X1 , 2)←p([P1 , 4], X1 , X2 ). p([P1 , 6], X1 , X2 )←p([P1 , 3], X1 , X2 ), X1 = 1. p([P1 , 6], X1 , X2 )←p([P1 , 5], X1 , X2 ). p([P1 , 7], X1 , 4)←p([P1 , 6], X1 , X2 ). p([P1 , 8], X1 , X2 )←p([P1 , 7], X1 , X2 ), X1 < 2. p([P1 , 9], X1 , X2 )←p([P1 , 8], X1 , X2 ). p([P1 , 0], X1 , 0)←p([P1 , 9], X1 , X2 ). Fig. 6. CLP Representation of Szymanski’s Algorithm At first it seems that the above assertions no longer defines an automorphism group since p([P1 , 8], X1 , X2 ), 3 ≤ P1 ≤ 7 |= p([8, P1 ], X2 , X1 ) can be derived from the last assertion, yet the inverse does not hold. However, by observation the assertion p([P1 , 8], X1 , X2 ) |= P1 < 3 ∨ P1 > 7 holds since it is not possible for process 2 to be in the critical section while process 1 is waiting. Similarly, p([P1 , 9], X1 , X2 ) |= P1 = 7 also holds. These impose restrictions on the last assertion above. We are not aware of any verification technique that would allow us to express and use this kind of symmetry. Example 6 (Serializability). We next discuss an application of relative safety assertion beyond symmetry. We show a producer/consumer program in Figure 7, which CLP representation is in Figure 8. The macros conk () and prol (), abstract program fragments that serve to produce and consume respectively. We will imagine that apart from the variable full there are other variables x which may be used in conk () and prol (). Consider the assertions: p([n + 1, P2], Full, f (X)), P2 ≤ n |= p([1, P2 ], Full, X). p([P1 , n], Full, g(X)), P1 ≥ 1 |= p([P1 , 0], Full, X). where the expression f (X) and g(X) are the results of performing con1 () . . . conn () and pro1 () . . . pron () respectively on X. Then the assertions say that the result of performing the interleaving of conk () and prol () macros, 1 ≤ k ≤ P1 − 1, 1 ≤ l ≤ P2 is as though the two sequences of transitions are serializable. Note that here we still have an automorphism group which contains the above assertions and their inverses. Both symmetry and serializability are examples of non-behavioral properties, i.e., properties determined by the structure of the program. They are not necessarily related to the intended result of the computation. Relative safety is potentially useful to specify many other useful non-behavioral properties, possibly ad-hoc and application specific. The class of such properties is potentially large. It is intuitively clear that such information can help in speeding up the proof process of other properties, which we will demonstrate later. 290 J. Jaffar, A.E. Santosa, and R. Voicu Consumer: while (true) do 0 await (full=1) full:=0 1 con1 () 2 n conn () n end 1 Producer: while (true) do 0 pro1 () 1 n 1 pron () n await (full=0) full:=1 n 1 end Fig. 7. Producer/Consumer p([0, 0], 0, X). % Initial State % Consumer p([1, P2 ], 0, X) ← p([0, P2 ], 1, X). p([2, P2 ], Full, X) ← p([1, P2 ], Full, X). p([n, P2 ], Full, X) ← p([n − 1, P2 ], Full, X). p([0, P2 ], Full, X) ← p([n, P2 ], Full, X). % Producer p([P1 , 1], Full, X) ← p([P1 , 0], Full, X). p([P1 , n], Full, X) ← p([P1 , n − 1], Full, X). p([P1 , n + 1], Full, X) ← p([P1 , n], Full, X). p([P1 , 0], 1, X) ← p([P1 , n + 1], 0, X). Fig. 8. Partial CLP Representation of Producer/Consumer 5 The Proof Method Now let G = (B1 , . . . , Bn , φ) and P denote a goal and program respectively. Let R = A ← C1 , . . . ,Cm , φ1 denote a rule in P, written so that none of its variables appear in G. Let the equation A = B be shorthand for the pairwise equation of the corresponding arguments of A and B. A reduct of G using R, denoted by reduct(G, R), is of the form (B1 , . . . , Bi−1 ,C1 , . . . ,Cm , Bi+1 , . . . , Bn , Bi = A ∧ φ ∧ φ1 ) provided the constraint Bi = A ∧ φ ∧ φ1 has a true ground instance. Since the CLP rules are implications, it follows that G ← reduct(G, R) holds. Definition 3 (Unfold). Given a program P and a goal G which contain one atom, a complete unfold of a goal G, denoted by unfold(G) is the set {G |∃R ∈ P : G = reduct(G, R)}. A (not necessarily complete) unfold of G is a set unfold (G) ⊆ unfold(G). Ï / and this holds only if Note that since [[G]] = 0/ only if G ∩ TP ([[ unfold(G)]]) = 0, Ï Ï / we have the logical semantics of unfold: G → unfold(G). [[ unfold(G)]] = 0, Definition 4 (Unfold Tree Goals). Given a program P and a set H of goals each contain one atom, we define the function δ(H) = H ∪ unfold (G1 ), when G1 ∈ H. We obtain a set of unfold tree goals of G by a finite successive applications of δ on {G}. Since for any goal G, G ← reduct(G, R), for any goal G1 in the unfold tree goals of G, G1 → G. Definition 5 (Frontier). Given a program P and a set H of goals which contains one atom, when there exists G1 ∈ H, we define the nondeterministic function ε(H) = (H − {G1 }) ∪ unfold(G1 ). ε() can be successively applied to a singleton set containing an initial goal G obtaining a frontier F = ε(. . . (ε({G})) . . .). From the logical semantics of unfold, for any frontier F of G, G → Ï F. Relative Safety GL Full Unfold ? |= GR Partial Unfold To Prove: GL1 ∨ . . . ∨ GLn |= GR1 ∨ . . . ∨ GRm GR1 ... GRj ... Coinduction ... GL1 , GL2 , . . . GLn 291 GRm ... Fig. 9. Informal Structure of Proof Process Intuitively, in order to prove GL |= GR , we proceed as follows: unfold GL completely to obtain a frontier containing the goals GL1 , . . . , GLn , and unfold GR (not necessarily completely) obtaining unfold tree goals GR1 , . . . , GRm . This is depicted in Figure 9. Then the proof holds if GL1 ∨ . . . ∨ GLn |= GR1 ∨ . . . ∨ GRm or alternatively, if GLi |= GR1 ∨ . . . ∨ GRm for all 1 ≤ i ≤ n. The justification for this result comes from the logical semantics of unfold: we have that GL → GL1 ∨ . . . ∨ GLn , and GRj → GR for all j such that 1 ≤ j ≤ m. By a chain of implications we may conclude GL |= GR . More specifically, but with some loss of generality, the proof holds if ∀i : 1 ≤ i ≤ n, ∃ j : 1 ≤ j ≤ m : GLi |= GRj . and for this reason, our proof obligation shall be defined below to be simply a pair of goals, written GLi |= GRj . Note that since we replace the global satisfaction criterion by local criteria, our proof method is therefore incomplete in cases where we need to perform some unfolds of GR , that is, when proving relative safety assertions. Unfold of GR is not needed for proving traditional safety assertions. Our proof method can also be viewed as checking that the set of states represented by the symbolic formula GR is reachable, whenever the set GL is reachable. This is done by showing that a frontier of states that reach GL also reaches GR . If GL is to be reachable from the initial state, it must be through at least one of the states in this frontier. And since from all states in the frontier GR is reachable, GR must also be reachable from the initial state. 5.1 Proof Rules We now present a calculus for proving relative safety assertions. To handle the possibly infinite unfoldings of GL and GR (see Figure 9), we shall depend on the use coinduction for the unfolding of GL . Proof by coinduction proceeds by assuming everything we like as long as we do not violate any facts. While assuming a set of assertions of the form GL |= GR collected along an unfold path, we prove another assertion on the path, making it unnecessary to unfold the path further. For the use of coinduction, we now give the following definition. 292 J. Jaffar, A.E. Santosa, and R. Voicu Definition 6 (Proof Obligation). A proof obligation is of the form à  GL |= GR , where GL and GR are goals and à is a set of assertions that are assumed. The role of proof obligations is to capture the state of a proof. The set à contains assertions that can be used coinductively to discard the proof obligation at hand. Our proof rules are presented in Figure 10. Each rule operates on the (possibly empty) set of proof obligations Π, by selecting a proof obligation from Π and attempting to discard it. In this process, new proof obligations may be produced. The proof process is typically centered around unfolding the goals in proof obligations. The left unfold and coinduction (LU + C) rule performs a complete unfold on the lhs of a proof obligation, producing a new set of proof obligations. The original assertion, while removed from Π, is added as an assumption to every newly produced proof obligation, opening the door to using coinduction in the proof. ( LU + C ) Ë Π∪ n Π ∪ {à  GL |= GR } i=1 {à ∪ {G ( DP ) ( SPL ) |= GR }  GLi |= GR } Π ∪ {à  GL |= GR } ( RU ) ( AP ) L Π ∪ {à  GL |= GRi } Π ∪ {à  GL , φ |= GR } Π ∪ {à  GR1 θ, φ |= GR } Π ∪ {à  GL , φ |= GR } Π GRi ∈ unfold(GR ) GL1 |= GR1 ∈ à and there exists a renaming θ s.t. GL |= GL1 θ There exists a renaming θ s.t. GL |= GR θ Π ∪ {à  GL |= GR } Π∪ Ëk i=1 {à  G L, φ i unfold(GL ) = {GL1 , . . . , GLn } |= GR } φ1 ∨ . . . ∨ φk is true. Fig. 10. Proof Rules Example 7 (Proving Symmetry). We exemplify our proof rules using a proof of a symmetry property of the 2-process bakery algorithm (Figure 2): p([P1 , P2 ], T1 , T2 ) |= p([P2 , P1 ], T2 , T1 ). (1) Initially, Π = {0/  p([P1 , P2 ], T1 , T2 ) |= p([P2 , P1 ], T2 , T1 )}. Using the rule LU + C, and all the CLP rules of Figure 2, we perform a left unfold of GL = p([P1 , P2 ], T1 , T2 ), obtaining a new set of proof obligations Π . In particular, by the unfold of CLP rule r1, Π includes the obligation (O1): à  p([P1 , P2 ], T1 , T2 ), P1 = 1, P1 = 0, T1 = T2 + 1 |= p([P2 , P1 ], T2 , T1 ), where à = {p([P1 , P2 ], T1 , T2 ) |= p([P2 , P1 ], T2 , T1 )}. Relative Safety 293 By the unfold of CLP rule init, Π also includes the obligation (O2): à  P1 = P2 = 0, T1 = T2 = 0 |= p([P2 , P1 ], T2 , T1 ). Other than these two obligations, Π also includes the result of unfolding using the rules e1, x1, r2, e2, and x2. The rule right unfold (RU) performs an unfold operation on the rhs of a proof obligation. Note that only one unfolded goal is used. Now, in practice, it is generally not known which reduct GRi of GR is the one we need later, or indeed if GR itself is needed later. Returning to our example, by unfolding GR = p([P2 , P1 ], T2 , T1 ) of (O1) using proof rule RU and CLP rule r2 of Figure 2, we obtain Π which includes (O3): à  p([P1 , P2 ], T1 , T2 ), P1 = 1, P1 = 0, T1 = T2 + 1 |= p([P2 , P1 ], T2 , T1 ), P1 = 1, P1 = 0, T1 = T2 + 1 Similarly, by unfolding the rhs of (O2) using RU and the CLP rule init, we obtain Π which includes the obligation (O4): à  P1 = P2 = 0, T1 = T2 = 0 |= P1 = P2 = 0, T1 = T2 = 0. The rule assumption proof (AP) transforms an obligation by using an assumption, and realizes the coinduction principle (since assumptions can only be created by the rule (LU + C)). Continuing our example, we can now immediately prove (O3) by rule AP, and applying the original symmetry assertion (1) which is included in the set of assumed assertions à of (O3). More concretely, we apply (1) to the lhs of (O3) obtaining the goal p([P2 , P1 ], T2 , T1 ), P1 = 1, P1 = 0, T1 = T2 + 1, which clearly implies the rhs of (O3) by renaming of each double primed variables to its single primed version. The rule direct proof (DP) discards a proof obligation when it can be directly proven that it holds, possibly by some renaming of variables. This rule is used to discharge (O4), since it is immediately clear that it holds. The renaming θ that we apply here is the identity. Finally, the rule split (SPL) converts a proof obligation into several, more specialized ones. Given an assertion GL |= GR , a proof shall start with Π = {à  GL |= GR }, and proceed by repeatedly applying the rules in Figure 10 to it. The conditions in which a proof can be completed are stated in the following theorem. Theorem 1 (Proof of Assertions). A safety assertion GL |= GR holds if, starting with the proof obligation Π = {0/  GL |= GR }, there exists a sequence of applications of / The safety assertion holds conditionally on à if we proof rules that results in Π = 0. / start with Π = {à  GL |= GR }, where à = 0. Our proof method can be used to prove traditional safety assertion GL |= Ψ, to prove relative safety assertion GL |= GR , where GR contains an atom, and to prove traditional safety assertion using other assertions, e.g., relative safety assertions representing symmetry, possibly obtaining smaller proof. For the last use we start a prove of traditional safety assertion with a non-empty set of assumed assertions. 294 J. Jaffar, A.E. Santosa, and R. Voicu The proof rules above are sufficient in principle for our purposes. However, there is a very important principle which gives rise to an optimization: redundancy between obligations which essential idea is based on the observation that in proving GL |= GR , we may obtain a goal GLi by a sequence of unfolds from GL , and prove the obligation GLi |= GR . Using this we can try to establish GLj , φ |= GR in another part of the tree, where i = j, where there exists a renaming θ such that GLj |= GLi θ. Here, we reuse the proof of GLi |= GR in the proof of GLj , φ |= GR . A fundamental question in proving relative safety assertion GL |= GR in general, is how to interleave the unfolding of the lhs versus the rhs. For this we can repeatedly apply left-unfolding on GL either until “looping”, that is, until each path in the tree contains a repeated occurrence of a program counter, or the final goal of the path is a constraint. This is because coinduction is likely to be applicable at a looping point. 6 Implementation and Experiments We implemented our proof algorithm as regular CLP(R ) [15] programs. Our prototype implementations use coinduction, and a tabling mechanism for storing assumed assertions. We run our prototypes using a 2 GHz Pentium 4 Xeon machine with 2 GB of RAM. Our first prototype is for proving relative assertions. Here we hope that the symmetry proof using coinduction concludes in just 1 level of unfold of both lhs and rhs of the assertion, because this is the case for perfectly symmetric programs. These include bakery algorithm and dining philosophers’ problem. In these examples, every transition from state s to t has its symmetric counterpart that maps π(s) to the π(t), where π an automorphism of states. Our implementation therefore first tries to check goals obtained from 1 level of both lhs and rhs unfold. For each goal in the lhs frontier, it tries to search for a goal in the rhs of depth 1, such that the original symmetry assertion is applicable coinductively. Where the proof does not conclude in this manner, we have a program with imperfect symmetry, such is the case with the simple priority mutual exclusion and Szymanski’s algorithm. In this case, general depth-first traversal of lhs subtree is initiated. For producer-consumer problem, we do not perform any lhs unfolding. Experimental results in proving relative safety assertions are shown in Table 1, where A#=number of verified assertions, LSt=number of visited lhs goals, RSt=number of visited rhs goals, and T=time in seconds. In ProblemName-N, N denotes the number of processes, except for Prod/Cons-N where N denotes that there are N produce and consume operations. Note that we could not complete the experiment for 6-process bakery algorithm and 3-process Szymanski’s algorithm after a few hours. We also implemented a second prototype to prove safety assertions of the form G |= false with or without assumed relative safety assertions (e.g., symmetry). G |= false declares non-reachability of error states G. A coinductive verification requires matching between the goal in an assertion and an assumed assertion such that the said assertion can be proven coinductively. As is common in the literature, for verification using symmetry we need to define a set of canonical representatives of the equivalence class of goals induced by given symmetry, such that the matching can be done efficiently among representatives. Unfortunately, finding all the canonical representatives of a goal is a hard problem known as the orbit Relative Safety 295 Table 1. Relative Safety Proof Experimental Results Problem Bakery-2 Bakery-3 Bakery-4 Bakery-5 Bakery-6 Philosopher-3 A# 1 2 3 4 5 1 LSt 9 44 147 424 ∞ 19 RSt 27 254 1557 7804 ∞ 124 T 0.00 0.10 11.28 2320.3 ∞ 0.01 Problem Philosopher-4 Priority Szymanski-2 Szymanski-3 Prod/Cons-10 Prod/Cons-20 A# 1 1 8 16 2 2 LSt 24 43 362 ∞ 0 0 RSt 232 220 28419 ∞ 170 530 T 0.02 0.04 59.11 ∞ 0.19 1.88 problem [2]. Our solution here is to try to generate canonical representatives of a goal only up to a constant number, and we employ a sorting algorithm as our canonicalization function. We note, however, that canonicalization is not hard for dining philosophers’ problem since for this problem it is a cyclic shift which is linear to the permutable domain size (cf. [2]). Also that neither sorting nor cyclic shift is necessary when using serializability assertions. The results are shown in Table 2 (a). The proof of traditional safety does not require right unfolding, hence there is no column for RSt value. We ran the bakery, Peterson’s, Lamport’s fast mutual exclusion and Szymanski’s algorithms proving mutual exclusion. Note that we do not prove the symmetry assertions of some of the problems (e.g., Szymanski-3). For the dining philosophers’ problem, we prove that there cannot be Table 2. Safety Proof Experimental Results Problem Bakery-2 Bakery-3 Bakery-4 Bakery-5 Bakery-6 Bakery-7 Peterson-2 Peterson-3 Peterson-4 Lamport-2 Lamport-3 Lamport-4 Szymanski-2 Szymanski-3 Philosopher-3 Philosopher-4 Prod/Cons-10 Prod/Cons-20 CLP/Coinductive Tabling DelzannoNo Assertion W/ Assertion Podelski LSt T LSt T # Facts 15 0.00 8 0.00 13 296 0.07 45 0.01 109 4624 6.60 191 0.20 963 ∞ ∞ 677 2.88 ∞ ∞ 2569 49.08 ∞ ∞ 11865 1052.32 105 0.05 10 0.00 20285 119.03 175 0.15 ∞ ∞ 3510 11.98 143 0.02 72 0.02 4255 1.13 707 0.40 ∞ ∞ 5626 7.63 240 0.08 84 0.02 10883 35.43 3176 2.91 882 0.51 553 0.30 4293 27.77 2783 9.67 664 0.10 171 0.02 2314 1.90 331 0.04 (a) Stored Assertions and Time Problem Type Bakery Peterson Lamport Szymanski Philosopher Prod/Cons % Reduction LSt T 76% 78% 95% 99.9% 67% 65% 68% 83% 36% 53% 87% 94% (b) % Reduction 296 J. Jaffar, A.E. Santosa, and R. Voicu more than N/2 philosophers simultaneously eating. For the producer-consumer problem, each proi () increments a variable x, and con j () decrements it. Here we verify that the value of x can never be more than 2n. Bakery algorithm has infinite reachable states, and therefore cannot be handled by finite-state model checkers. We compare our search space the results of the CLP-based system of Delzanno and Podelski [4]. As also noted by Delzanno and Podelski, the problem does not scale well to larger number of processes, but using symmetry, we have pushed its verification limit to 7 processes without abstraction. In Table 2 (b) we summarize the effectiveness of the use of a variety of relative safety assertions. The use of symmetry assertion effectively reduces the search space of perfectly symmetric problems (bakery, Peterson’s, Lamport’s fast mutex, dining philosophers). However, the reduction for Szymanski’s algorithm is competitive with perfectly symmetric problems, showing that “not-quite” symmetry reduction is worth pursuing. The use of rotational symmetry in the dining philosophers’ problem is, expectedly, less effective. We also note that we managed to obtain a substantial reduction of state space for the producer/consumer problem. Reduction in time roughly corresponds to those of state space. Finally, comparing Table 1 and 2, the proof of relative safety assertions are no easier than the proof of traditional safety assertions, even with coinduction. This is because of the need to perform rhs unfold when proving relative safety. 7 Conclusion In this paper, we introduced a novel assertion called relative safety. This can be uniquely used to assert structural properties of programs. We chose a driving application area of symmetry, and demonstrated that, by using relative safety, we could accommodate a larger class of programs than have been previously considered by other means. We provided a proof system, based upon well understood computational steps of unfolding, and introduced a new coinductive tabling mechanism. We then ran some experiments in order to show the practical potential of our algorithm. Further work is to discover more important classes of structural properties for which relative safety can be used. References 1. M. Abadi and L. Lamport. An old-fashioned recipe for real time. ACM TOPLAS, 16(5):1543– 1571, September 1994. 2. E. M. Clarke, E. A. Emerson, S. Jha, and A. P. Sistla. Symmetry reductions in model checking. In A. J. Hu and M. Y. Vardi, editors, 10th CAV, volume 1427 of LNCS, pages 147–158. Springer, 1998. 3. E. M. Clarke, T. Filkorn, and S. Jha. Exploiting symmetry in temporal logic model checking. In 5th CAV, volume 697 of LNCS, pages 450–462. Springer, 1993. 4. G. Delzanno and A. Podelski. Constraint-based deductive model checking. Int. J. STTT, 3(3):250–270, 2001. 5. X. Du, C. R. Ramakrishnan, and S. A. Smolka. Tabled resolution + constraints: A recipe for model checking real-time systems. In 21st RTSS, pages 175–184. IEEE Computer Society Press, 2000. Relative Safety 297 6. E. A. Emerson. From asymmetry to full symmetry: New techniques for symmetry reductions in model checking. In L. Pierre and T. Kropf, editors, 10th CHARME, volume 1703 of LNCS, pages 142–156. Springer, 1999. 7. E. A. Emerson, J. Havlicek, and R. J. Trefler. Virtual symmetry reduction. In 15th LICS, pages 121–131. IEEE Computer Society Press, 2000. 8. E. A. Emerson and A. P. Sistla. Model checking and symmetry. In 5th CAV, volume 697 of LNCS, pages 463–478. Springer, 1993. 9. E. A. Emerson and A. P. Sistla. Utilizing symmetry when model-checking under fairness assumptions. ACM TOPLAS, 19(4):617–638, July 1997. 10. F. Fioravanti, A. Pettorossi, and M. Proietti. Verifying CTL properties of infinite-state systems by specializing constraint logic programs. In M. Leuschel, A. Podelski, C. R. Ramakrishnan, and U. Ultes-Nitsche, editors, 2nd VCL, pages 85–96, 2001. 11. L. Fribourg. Constraint logic programming applied to model checking. In 9th LOPSTR, volume 1817 of LNCS, pages 30–41. Springer, 1999. 12. G. Gupta and E. Pontelli. A constraint-based approach for specification and verification of real-time systems. In 18th RTSS, pages 230–239. IEEE Computer Society Press, 1997. 13. C. N. Ip and D. L. Dill. Better verification through symmetry. FMSD, 9(1/2):41–75, 1996. 14. J. Jaffar and M. J. Maher. Constraint logic programming: A survey. J. LP, 19/20:503–581, May/July 1994. 15. J. Jaffar, S. Michaylov, P. J. Stuckey, and R. H. C. Yap. The CLP(R ) language and system. ACM TOPLAS, 14(3):339–395, 1992. 16. J. Jaffar, A. Santosa, and R. Voicu. A CLP proof method for timed automata. In 25th RTSS, pages 175–186. IEEE Computer Society Press, 2004. 17. M. Leuschel and T. Massart. Infinite-state model checking by abstract interpretation and program specialization. In 9th LOPSTR, volume 1817 of LNCS, pages 62–81. Springer, 1999. 18. U. Nilsson and J. Lübcke. Constraint logic programming for local and symbolic model checking. In J. W. Lloyd, V. Dahl, U. Furbach, M. Kerber, K.-K. La u, C. Palamidessi, L. M. Pereira, Y. Sagiv, and P. J. Stuckey, editors, 1st CL, volume 1861 of LNCS, pages 384–398. Springer, 2000. 19. Y. S. Ramakrishna, C. R. Ramakrishnan, I. V. Ramakrishnan, S. A. Smolka, T. Swift, and D. S. Warren. Efficient model checking using tabled resolution. In O. Grumberg, editor, 9th CAV, volume 1254 of LNCS, pages 143–154. Springer, 1997. 20. A. P. Sistla and P. Godefroid. Symmetry and reduced symmetry in model checking. ACM TOPLAS, 26(4):702–734, July 2004. 21. A. P. Sistla, V. Gyuris, and E. A. Emerson. SMC: A symmetry-based model checker for verification of safety and liveness properties. ACM TOSEM, 9(2):133–166, April 2000. 22. F. Wang. Efficient data structure for fully symbolic verification of real-time systems. In S. Graf and M. I. Schwartzbach, editors, 6th TACAS, volume 1785 of LNCS, pages 157–171. Springer, 2000. 23. H. Weyl. Symmetry. Princeton University Press, 1952. Resource Usage Analysis for the π-Calculus Naoki Kobayashi1, Kohei Suenaga2 , and Lucian Wischik3 1 Tohoku University koba@ecei.tohoku.ac.jp 2 University of Tokyo kohei@yl.is.s.u-tokyo.ac.jp 3 Microsoft Corporation lwischik@microsoft.com Abstract. We propose a type-based resource usage analysis for the πcalculus extended with resource creation/access primitives. The goal of the resource usage analysis is to statically check that a program accesses resources such as files and memory in a valid manner. Our type system is an extension of previous behavioral type systems for the pi-calculus, and can guarantee the safety property that no invalid access is performed, as well as the property that necessary accesses (such as the close operation for a file) are eventually performed unless the program diverges. A sound type inference algorithm for the type system is also developed to free the programmer from the burden of writing complex type annotations. Based on the algorithm, we have implemented a prototype resource usage analyzer for the π-calculus. To the authors’ knowledge, ours is the first type-based resource usage analysis that deals with an expressive concurrent language like the π-calculus. 1 Introduction Computer programs access many external resources, such as files, library functions, device drivers, etc. Such resources are often associated with certain access protocols; for example, an opened file should be eventually closed and after the file has been closed, no read/write access is allowed. The aim of resource usage analysis [9] is to statically check that programs conform to such access protocols. Although a number of approaches, including type systems and model checking, have been proposed so far for the resource usage analysis or similar analyses [1, 5–7, 9], most of them focused on analysis of sequential programs, and did not treat concurrent programs, especially those involving dynamic creation/passing of channels and resources. In the present paper, we propose a type-based method of resource usage analysis for concurrent languages. Dealing with concurrency is especially important because concurrent programs are hard to debug, and also because actual programs accessing resources are often concurrent. We use the π-calculus (extended with resource primitives) as a target language so that our analysis can be applied to a wide range of concurrency primitives (including those for dynamically creating and passing channels) in a uniform manner. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 298–312, 2006. c Springer-Verlag Berlin Heidelberg 2006  Resource Usage Analysis for the π-Calculus 299 For the purpose of analyzing resource usage, we extend previous behavioral type systems for the π-calculus [3, 8]. The idea of the behavioral types [3, 8] is to use CCS-like processes as types. The types express abstract behavior of processes, so that certain properties of processes can be verified by verifying the corresponding properties of their types, using, for example, model checking techniques. The latter properties (of CCS-like types) are more amenable to automatic verification techniques like model checking than the former ones, because the types do not have channel mobility and also because the types typically represent only the behavior of a part of the entire process. Following the previous behavioral types, we use CCS-like types to express resource-wise access behaviors of a process and construct a type system which guarantees that any well-typed process uses resources in a valid manner. The main contributions of the present paper are: – Adaption of behavioral types (for pure π-calculus) [3, 8] to the π-calculus extended with resource access primitives. – Realization of fully automatic verification (while making the analysis more precise than [8]). Igarashi and Kobayashi [8] gave only an abstract type system, without giving a concrete type inference algorithm. Chaki et al. [3] requires type annotations. The full automation was enabled by a combination of a number of small ideas, like inclusion of hiding and renaming as type constructors (Igarashi and Kobayashi [8] used a fragment without hiding and renaming, and Chaki et al. [3] used a fragment without renaming), approximation of a CCS-like type by a Petri net (to reduce the problem of checking conformance of inferred types to resource usage specification). – Verification of not only the usual safety property that an invalid resource access does not occur, but also an extended safety (which we call partial liveness) that necessary resource accesses (e.g. closing of a file) are eventually performed unless the whole process diverges. The partial liveness is not guaranteed by Chaki et al.’s type system [3]. A noteworthy point about our type system for guaranteeing the partial liveness is that it is parameterized by a mechanism that guarantees deadlock-freedom (in the sense of Kobayashi’s definition [13]). So, our type system can be combined with any mechanism (model checking, abstract interpretation, another type system, or whatever) to verify deadlock-freedom. – Implementation of a prototype resource usage analyzer based on the proposed method. The implementation can be tested at http://www.yl.is.s. u-tokyo.ac.jp/~kohei/usage-pi/. The rest of this paper is structured as follows. Section 2 introduces an extension of the π-calculus with primitives for creating and accessing resources. Section 3 introduces a type system for resource usage analysis. Section 4 gives a type inference algorithm for the type system. Section 5 presents our prototypical implementation. Section 6 discusses related work. Section 7 concludes. For lack of space, proofs and some explanations have been omitted. They are found in the full version of this paper [15]. 300 2 N. Kobayashi, K. Suenaga, and L. Wischik Language Let x, y, z range over a countably infinite set Var of variables, let values v range over variables and also the two constant values true and false, let tags t range over {∅, c}, let ξ range over a set of access labels, and let Φ (called a trace set ) denote a set of sequences of access labels, possibly ending with a special label ↓, that is closed under the prefix operation. We write x  for a sequence x1 , . . . , xn of variables, and similarly  v , and define Φ−ξ = {s | ξs ∈ Φ}. Let L range over reduction labels {xξ | x ∈ Var} ∪ {τ }. P ::= 0 | (P | Q) | if v then P else Q | (νx) P | ∗P | xt e v . P | xt (e y ). P | (NΦ x)P | accξ (x).P Structural preorder  is as follows. P ≡ Q stands for (P  Q) ∧ (Q  P ). P |0 ≡ P P |Q ≡ Q|P P | (Q | R) ≡ (P | Q) | R Φ ∗P  ∗P | P Φ (νx) P | Q  (νx) (P | Q) and (N x)P | Q  (N x)(P | Q)if x not free in Q P  P  Q  Q P | Q  P  | Q P Q (νx) P  (νx) Q P Q (NΦ x)P  (NΦ x)Q L L Labeled relation −→ is as follows. Write P −→ Q when P −→ Q for some L, and −→∗ for reflexive and transitive closure of −→. Define target(xξ ) = {x} and target(τ ) = ∅. τ xt e z . P | xt (e y ). Q −→ P | [e z /e y ]Q L L P −→ Q P −→ Q L L P −→ Q (NΦ x)P (νx) P −→ (νx) Q x ∈ target(L) L −→ (NΦ x)Q xξ x ∈ target(L) L P | R −→ Q | R τ if true then P else Q −→ P τ if false then P else Q −→ Q xξ accξ (x).P −→ P P  P P −→ Q τ (NΦ x)P −→ (NΦ−ξ x)Q L P  −→ Q Q  Q L P −→ Q Fig. 1. Process language The process language P is in Figure 1. The first line is standard π-calculus – (νx) P declares a new channel with bound name x, ∗P is replication, and parallel composition | binds less tightly than the prefixes. We often omit trailing 0. Bound and free variables are defined as normal. We identify processes up to αconversion, and so assume that bound variables are always different from each other and from free variables. y ). P waits for input on channel x with bound formal The input command xt ( parameters y and then behaves as P . The output command xt  v . P sends  v along x and then behaves as P . The attribute t is either c (indicating that if the input is executed then it will succeed unless the whole process diverges) or ∅ (which does not give the guarantee). We often omit ∅. Note that the attributes do not affect the operational semantics of processes. Typically the attributes Resource Usage Analysis for the π-Calculus 301 have been inferred by some deadlock analysis tool such as TyPiCal [10–12, 14]. For this paper, we assume that the correctness of the attributes are ensured by whichever deadlock-analysis tool used to make the annotations. For the final line in the definition of processes, (NΦ x)P declares a resource with bound name x which is to be accessed according to specification Φ, and accξ (x).P performs access ξ on resource x and then behaves like P . Resources here are an abstraction of real-world resources such as files or objects. In this paper we consider accesses such as I for initialize, R for read, W for write and ∗ # C for close. For example, (N(I(R+W ) C ↓) x)P creates a resource that should be first initialized, read or written an arbitrary number of times, and then closed. The symbol ↓ at the end indicates that the final close is required eventually to occur. Here, (S)# is the prefix closure of S, i.e., {s | ss ∈ S}. We write  for the empty access sequence. We write init(x).P for accI (x).P , and similarly read(x), write(x), close(x). We do not fix the syntax of Φ. Our type system is independent of the choice of the language for describing the specificaiton Φ (except for the sub-algorithm for type-checking discussed in Section 4.1, where we assume that Φ is a regular language). We treat resources as primitives in this paper, and give operational semantics where accξ (x).P is non-blocking. This is for simplicity. It would also be possible to treat a resource with (say) three access labels as a tuple of three channels. This would allow previous work [3, 8] to infer some of the properties of this paper, albeit with less precision and more complexity. Also in this paper we have specifications Φ apply only to a single resource. To model a program with two co-declared resources as in [8] with intertwined specifications, we would instead merge them into a single resource with a single specification. The operational semantics of the language are given in Figure 1, through a L structural preorder  and a labeled reduction relation −→. Notice that invalid resource access sets Φ = ∅, valid access removes a prefix from Φ, and complete access results in Φ = {, ↓}). (N(IC ↓) x)read(x).0 → (N∅ x)0 # (N(IC ↓) x)init(x).0 → (N(C ↓) x)0 # (IC ↓)# (N # {,↓} x)init(x).close(x).0 → (N x)0 (invalid access) (valid access) (complete access) We are concerned with the following properties. Definition 1. 1. A process P is safe if it does not contain a sub-expression of the form (N∅ x)Q. Φ  2. A process P is partially live if ↓ ∈ Φ whenever P −→∗  ( ν N)(N x)Q −→. The first property means that the process has not performed any invalid access. The second property means that necessary accesses are eventually performed before the whole process converges. In the next section, we shall develop a type system that guarantees the safety and partial liveness. Example 1. The following example process is safe and partially live. It uses internal synchronization to ensure that the resource x is accessed in a valid order. 302 N. Kobayashi, K. Suenaga, and L. Wischik  ∗ # (N(IR C) x)(νy) (νz) init(x).(y   | y ) /* | yc ( ). read(x).z   /* | yc ( ). read(x).z   /* | zc ( ). zc ( ). close(x) /* 3 3.1 initialize x, and send signals */ wait on y, then read x, and signal on z*/ wait on y, then read x, and signal on z*/ wait on z, then close x */ Type System Types We first introduce the syntax of types. We use two categories of types: value types and behavioral types. The latter describes how a process accesses resources and communicates through channels. As mentioned in Section 1, we use CCS processes for behavioral types. Definition 2 (types). The sets of value types σ and behavioral types A are defined by: σ ::=bool | res | chan(x1 : σ1 , . . . , xn : σn )A A::=0 | α | at .A | xξ .A | τt .A | (A1 | A2 ) | A1 ⊕ A2 | ∗A | y1 /x1 , . . . , yn /xn A | (νx) A | µα.A | A↑S | A↓S a (communication labels) ::= x | x A behavioral type A, which is a CCS process, describes what kind of communication and resource access a process may perform. 0 describes a process that performs no communication or resource access. The types xt . A, xt . A, xξ .A and τt .A describes process that first perform an action and then behave according to A; the actions are, respectively, an input on x, an output on x, an access operation ξ on x, and the invisible action. Attributes t denote whether an action is guaranteed to succeed. A1 | A2 describes a process that performs communications and resource access according to A1 and A2 , possibly in parallel. A1 ⊕A2 describes a process that behaves according to either A1 or A2 . ∗A describes a process that behaves like A an arbitrary number of times, possibly in parallel. y / xA, denotes simultaneous renaming of y1 /x1 , . . . , yn /xn A, abbreviated to  x  with y in A. (νx) A describes a process that behaves like A for some hidden channel x. For example, (νx) (x. y | x) describes a process that performs an output on y after the invisible action on x. The type µα.A describes a process that  behaves like a recursive process defined by α = A. The type A↑S describes a process that behaves like A, except that actions whose targets are in S are replaced by the invisible action τ , while A↓S describes a process that behaves like A, except that actions whose targets are not in S are replaced by τ . The formal semantics of behavioral types is defined later using labeled transition semantics. As for value types, bool is the type of booleans. res is the type of rex:σ )A, sources. The type chan(x1 : σ1 , . . . , xn : σn )A, abbreviated to chan( describes channels carrying tuples consisting of values of types σ1 , . . . , σn . Here the type A approximates how a receiver on the channel may use the elements Resource Usage Analysis for the π-Calculus 303 x1 , . . . , xn of each tuple for communications and resource access. For example, chan(x : res, y : res)xR .y C  describes channels carrying a pair of resources, where a party who receives the actual pair (x , y  ) will first read x and then  and write chan( x)A for chan( x:σ  )A. When close y  . We sometimes omit σ x  is empty, we also write chan. Note that  y / x is treated as a constructor rather than an operator for performing the actual substitution. We write [ y / x] for the latter throughout this paper.  y / xA is slightly different from the relabeling of the standard CCS [17]: y/x(x | y) allows the communication on y, but the relabeling of CCS does not. This difference calls for the introduction of a special transition label {x, y} in Section 3.2. , and the variables in S respectively. We (νx) A,  y / xA, and A↑S bind x, x write FV(A) for the set of free variables in A. We identify behavioral types up to renaming of bound variables. In the rest of this paper, we require that every channel type chan(x1 : σ1 , . . . , xn : σn )A must satisfy FV(A) ⊆ {x1 , . . . , xn }. For example, chan(x:res)xR  is a valid type but chan(x:res)y R  is not. By abuse of notation, we write v1 /x1 , . . . , vn /xn A for vi1 /xi1 , . . . , vik /xik A where {vi1 , . . . , vik } = {v1 , . . . , vn }\{true, false}. For example, true/x, y/zA stands for y/zA. 3.2 Semantics of Behavioral Types l We give a labeled transition relation −→ for behavioral types. The transition labels l are l ::= x | x | xξ | τ | {x, y} The label {x, y} indicates the potential to react in the presence of a substitution that identifies x and y. We also extend target to the function on transition labels by: target ({x, y}) = {x, y} target (x) = target(x) = {x} l Figure 2 shows a part of the definition of the transition relation −→ on behavioral types. For the complete definition, see the full paper [15]. We write =⇒ for the τ l l reflexive and transitive closure of −→. We also write =⇒ for =⇒−→=⇒. a at .A→A A→A l xξ target (l)⊆S l A↑S →A ↑S l target (l)⊆S l A↓S →A ↓S τt .A→A A→A τ A→A τ xξ .A → A target (l)∩S=∅ l A↑S →A ↑S A→A l target (l)∩S=∅ τ A↓S →A ↓S Fig. 2. A Part of Definition of Transition semantics of behavioral types 304 N. Kobayashi, K. Suenaga, and L. Wischik Remark 1. (νx) A should not be confused with A↑{x} . (νx) A is the hiding operator of CCS, while A↑{x} just replaces any actions on x with τ [8]. For example, τ yξ (νx) (x. y ξ ) cannot make any transition, but (x. y ξ )↑{x} −→−→ 0↑{x} . We next define a predicate disabled(A, S) inductively as follows. disabled(0, S) disabled(xξ .A, S) if disabled(A, S) and x ∈ S disabled(ac .A, S) if disabled(A, S) disabled(a∅ .A, S) disabled(τc .A, S) if disabled(A, S) disabled(τ∅ .A, S) disabled(A1 | A2 , S) if disabled(A1 , S) and disabled(A2 , S) disabled(A1 ⊕ A2 , S) if disabled(A1 , S) or disabled(A2 , S) disabled(∗A, S) if disabled(A, S) disabled((νx) A, S) if disabled(A, S\{x}) disabled(A↑S  , S) if disabled(A, S\S  ) disabled(A↓S  , S) if disabled(A, S ∩ S  ) disabled( y / xA, S) if disabled(A, {z | [ y / x]z ∈ S}) disabled(µα.A, S) if disabled([µα.A/α]A, S) Intuitively, disabled(A, S) means that A describes a process that may get blocked without accessing any resources in S. The set etracesx (A) defined below is the set of possible access sequences on x described by A. Definition 3 (extended traces). The set etracesx (A) of extended traces is: xξ1 xξn {ξ1 · · · ξn ↓ |∃B.A↓{x} =⇒ · · · =⇒ B ∧ disabled(B, {x})} xξ1 xξn ∪{ξ1 · · · ξn |∃B.A↓{x} =⇒ · · · =⇒ B} We define the subtyping relation A1 ≤ A2 below. Intuitively, A1 ≤ A2 means that a process behaving according to A1 can also be viewed as a process behaving according to A2 . To put in another way, A1 ≤ A2 means that A2 simulates A1 .1 We define ≤ for only closed types, i.e., those not containing free type variables. Definition 4 (subtyping). The subtyping relation ≤ on closed behavioral types is the largest relation that satisfies the following properties: l l – A1 ≤ A2 and A1 −→ A1 implies A2 =⇒ A2 and A1 ≤ A2 for some A2 . – disabled(A1 , S) implies disabled(A2 , S) for any set S of variables. We often write A1 ≥ A2 for A2 ≤ A1 , and write A1 ≈ A2 for A1 ≤ A2 ∧ A2 ≤ A1 . 1 Note that the subtyping relation defined here is the converse of the one used in Igarashi and Kobayashi’s generic type system [8]. Resource Usage Analysis for the π-Calculus 3.3 305 Typing We consider two kinds of judgments, Γ v : σ for values, and Γ P : A for processes. Γ is a mapping from a finite set of variables to value types. In Γ P : A, the type environment Γ describes the types of the variables, and A describes the possible behaviors of P . For example, x : chan(b : bool)0 P : x | x implies that P may send booleans along the channel x twice. The judgment y : chan(x : chan(b : bool)0)x Q : y means that Q may perform an input on y once, and then it may send a boolean on the received value. Note that in the judgment Γ P : A, the type A is an approximation of the behavior of P on free channels. P may do less than what is specified by A, but must not do more; for example, x : chan( )0 x  : x | x holds but x : chan( )0 x . x  : x does not. Because of this invariant, if A does not perform any invalid access, neither does P . We write dom(Γ ) for the domain of Γ . We write ∅ for the empty type environment, and write x1 : τ1 , . . . , xn : τn (where x1 , . . . , xn are distinct from each other) for the type environment Γ such that dom(Γ ) = {x1 , . . . , xn } and Γ (xi ) = τi for each i ∈ {1, . . . , n}. When x ∈ dom(Γ ), we write Γ, x : τ for the type environment ∆ such that dom(∆) = dom(Γ ) ∪ {x}, ∆(x) = τ , and ∆(y) = Γ (y) for y ∈ dom(Γ ). We define the value judgment relation Γ v:σ to be the least relation closed under Γ, x:σ x:σ Γ true:bool Γ false:bool. We write Γ v: σ as an abbreviation for (Γ v1 :σ1 ) ∧ · · · ∧ (Γ vn :σn ). Figure 3 gives the rules for the relation Γ P : A. We explain key rules below. In rule (T-Out), the first premise Γ P : A2 implies that the continuation of the y:σ  )A1  output process behaves like A2 , and the second premise Γ x : chan( Γ  P : A2 Γ  x : chan(y : σ)A1  Γ  v:σ (T-Out) Γ  xt v. P : xt . (v/yA1 | A2 ) Γ, y : σ  P : A2 Γ  x : chan(y : σ)A1  A2 ↓{y} ≤ A1 Γ  xt (y). P : xt . (A2 ↑{y} ) Γ  P1 : A1 Γ  P2 : A2 (T-Par) Γ  0:0 Γ  P1 | P2 : A1 | A2 Γ  P :A Γ  x : res Γ  P :A (T-Rep) Γ  accξ (x).P : xξ .A Γ  ∗P : ∗A Γ  v : bool Γ  P :A Γ  Q:A (T-If) Γ  (νx) P : (νx) A2 (T-Zero) (T-Acc) etracesx (A) ⊆ Φ Γ, x : res  P : A Φ Γ  (N x)P : A↑{x} Γ  if v then P else Q : A Γ, x : chan(y : σ)A1   P : A2 (T-In) Γ  P : A (T-New) Fig. 3. Typing Rules A ≤ A Γ  P :A (T-NewR) (T-Sub) 306 N. Kobayashi, K. Suenaga, and L. Wischik implies that the tuple of values v being sent may be used by an input process according to  v / y A1 . Therefore, the whole behavior of the output process is described by x. ( v / yA1 | A2 ). Note that, as in previous behavioral type systems [3, 8], the resource access and communications made on v by the receiver of v are counted as the behavior of the output process. In rule (T-In), the first premise implies that the continuation of the input process behaves like A2 . Following previous behavioral type systems [3, 8], we split A2 into two parts: A2 ↓{y} and A2 ↑{y} . The first part describes the behavior on the received values y and is taken into account in the channel type. The second part describes the resource access and communications performed on other values, and is taken into account in the behavioral type of the input process. The condition A2 ↓{y} ≤ A1 requires that the access and communication behavior on y conforms to A1 , the channel arguments’ behavior. In (T-New), the premise implies that P behaves like A, so that (νx) P behaves like (νx) A. Here, we only require that x is a channel, unlike in the previous behavioral type systems for the π-calculus [8, 10]. That is because we are only interested in the resource access behavior; the communication behavior is used only for accurately inferring the resource access behavior. In (T-NewR), we check that the process’s behavior A conforms to the resource usage specification Φ. Rule (T-Sub) allows the type A of a process to be replaced by its approximation A. Example 2. Consider the process P = (νs) (∗s(n, x, r). P1 | (NΦ x)P2 ), where: P1=if n = 0 then r else (νr ) (sn − 1, x, r  | r c (). read(x).r) P2=(νr) (init(x).s100, x, r | rc (). close(x)) Φ = (IR∗ C ↓)# Let A1 = µα.(r ⊕ (νr ) (r /rα|r c . xR .r) and let Γ = s:chan(n:int, x:res, r:chan) A1 . Then Γ, n:int, x:res, r:chan P1 : A1 Γ Γ ∗s(n, x, r). P1 : ∗s. (A1 ↑{n,x,r} ) ≈ ∗s P2 : (νr) (x .A1 |rc . x ) I C So long as etracesx ((νr) (xI .A1 |r. xC )) ⊆ Φ, we obtain ∅ P : 0. See Section 4.1 for the algorithm that establishes etracesx (·) ⊆ Φ. 2 Remark 2. The type A1 in the example above demonstrates how recursion, hiding, and renaming are used together. In general, in order to type a recursive process of the form ∗s(x). (νy) (· · · sy · · · ), we need to find a type that satisfies (νy) (· · · y/xA · · · ) ≤ A. Moreover, for the type inference (in Section 4), we must find the least such A. Thanks to the type constructors for recursion, hiding, and renaming, we can always do that: A can be expressed by µα.(νy) (· · · y/xα · · · ). The following theorem states that no well-typed process performs an invalid access to a resource. Theorem 1 (type soundness (safety)). Suppose that P is safe. If Γ and P −→∗ Q, then Q is safe. P :A Resource Usage Analysis for the π-Calculus 307 Theorem 2 below states that well-typed programs eventually perform all the necessary resource accesses (unless the whole process diverges).  c  Definition 5 (well-annotatedness). P is active if P  ( ν N)(x v . Q | R) or   P  ( ν N)(xc ( y ). Q | R). P is well-annotated if for any P such that P −→∗ P  and active(P  ), there exists P  such that P  −→ P  . Theorem 2. If well annotated (P ) and ∅ P : A, then P is partially live. 4 Type Inference Algorithm This section discusses an algorithm which takes a closed process P as an input and checks whether ∅ P : 0 holds. The algorithm consists of the following steps. 1. Extract constraints on type variables based on the (syntax-directed version of) typing rules. 2. Reduce constraints to trace inclusion constraints of the form {etracesx1 (A1 ) ⊆ Φ1 , . . . , etracesxn (An ) ⊆ Φn } 3. Decide whether the trace inclusion constraints are satisfied. The algorithm for Step 3 is sound but not complete. The first two steps are fairly standard [9, 10]. Based on the typing rules, we can transform ∅ P : 0 to equivalent constraints of the form: {α1 ≥ A1 , . . . , αn ≥ An , etracesx1 (B1 ) ⊆ Φ1 , . . . , etracesxm (Bm ) ⊆ Φm } where α1 , . . . , αn are different from each other. Each subtype constraint α ≥ A can be replaced by α ≥ µα.A. Therefore, the above constraints can be further reduced to:  /  / α]B1 ) ⊆ Φ1 , . . . , etracesxm ([A α]Bm ) ⊆ Φm } {etracesx1 ([A Here, A1 , . . . , An are the least solutions for the subtype constraints. Thus, we have reduced type checking to the validity of trace inclusion constraints of the form etracesx (A) ⊆ Φ. Example 3. Recall Example 2. We obtain the constraint etracesx (A1 )⊆(IR∗ C)# where A1 = (νr) (xI .s. A2 | r. xC ) A3 = µα2 .α2 A2 = µα1 .r. A3 ⊕ (νr ) (s. r /rα1 | r . xR .r. A3 )↓{n,x,r} . 4.1 Step 3: Constraint Solving We present an approximate algorithm for checking how to check a trace inclusion constraint etracesx (A) ⊆ Φ when the trace set Φ is a regular language. (Actually, we can extend the algorithm to deal with the case where Φ is a deterministic Petri net language: see the full version [15].) 308 N. Kobayashi, K. Suenaga, and L. Wischik The algorithm consists of the following three steps. – Approximate the behavior of A↓{x} by a (labeled) Petri net NA1 ,x . – Construct a Petri net NA1 ,x  MΦ that simultaneously simulates NA1 ,x and a minimized deterministic automaton MΦ that accepts Φ. – Check that NA1 ,x  MΦ does not reach any invalid state. Here, the set of invalid states consists of (1) states where NA1 ,x can make a ξ-transition while MΦ cannot, and (2) states where NA1 ,x is disabled (in other words, can make a ↓-transition) while MΦ cannot make a ↓-transition. The last part amounts to solving a reachability problem of Petri nets. In the implementation, we further approximate the Petri net by a finite state machine. We sketch the first step of the algorithm with an example below. Attributes are omitted below for simplicity. Please consult the full version [15] for more details and the other two steps. In Example 3 above, we have reduced the typability of the process to the equivalent constraint etracesx (A1 ) ⊆ Φ where Φ = (IR∗ C ↓)# and A1 ↓{x} ≈ (νr) (xI .A2 | r. xC ) A2 = r ⊕ (νr ) (r /rA2 | r . xR .r) Here, we have omitted A3 = µα.α since it is insignificant. Approximate the behavior of A1 ↓{x} by a Petri net [19] NA1 ,x . This part is similar to the translation of usage expressions into Petri nets in Kobayashi’s previous work [10, 11, 14]. Since the behavioral types are more expressive (having recursion, hiding, and renaming), however, we need to approximate the behavior of a behavioral type unlike in the previous work. In this case A1 ↓{x} is infinite. To make it tractable we make a sound approximation A1 by pushing (ν) to top level, and we eliminate r /r: A1 = (νr, r ) (xI .A2 | r. xC ) A2 = r ⊕ (A3 | r . xR .r) A3 = r ⊕ (A3 | r . xR .r ) Then NA1 ,x is as pictured in Figure 4. (Here we treat A1 ⊕ A2 as τ.A1 ⊕ τ.A2 for clarity. We also use a version of Petri nets with labeled transitions.) The rectangles are the places of the net, and the dots labeled by τ, xR , etc. are the B1 B10 xI .A2 r.xC I B2 τ.r ⊕ τ.(A3|r.xR.r) τ τ B5 τ xC B8 R r .xR .r B3 τ.r  ⊕ τ.(A3|r .xR.r) τ τ B6 B11 r B9 r r .xR .r  xR.r τ B4 R τ Fig. 4. NA1 ,x B7 xR.r  C Resource Usage Analysis for the π-Calculus 309 transitions of the net. Write ix for the number of tokens at node Bx . The behavior  A1 corresponds to the initial marking {i1 =1, i10 =1}. We say that the nodes B together with the restricted names (r, r ) constitute a basis for A1 . Note here that etracesx (A1 ) ⊆ etracesx (A1 ) = ptraces(NA1 ,x ) where ptraces(NA1 ,x ) is the set of traces of the Petri net. Thus, ptraces(NA1 ,x ) ⊆ Φ is a sufficient condition for etracesx (A1 ) ⊆ Φ . The key point here is that A1 still has infinite states, but all its reachable states can be expressed in the form (νr, r ) (i1 B1 | · · · | i11 B11 ) (where ik Bk is the parallel composition of ik copies of Bk ), a linear combination  That is why we could express A1 by the Petri net of finitely many processes B. as above. 5 Implementation We have implemented a prototype resource usage analyzer based on the type system proposed in this paper. We have tested all the examples given in the present paper. The implementation can be tested at http://www.yl.is.s.u-tokyo.ac. jp/~kohei/usage-pi/. The analyzer takes a pi-calculus program as an input, and uses TyPiCal[11] to annotate each input or output action with an attribute on whether the action is guaranteed to succeed automatically. The annotated program is then analyzed based on the algorithm described in Section 4. The followings are some design decisions we made in the current implementation. We restrict the resource usage specification (Φ) to the regular languages although in future we may extend it to deterministic Petri net languages. In the algorithm for checking etracesx (A) ⊆ Φ, we blindly approximate A by pushing all of its ν-prefixes to the top-level. In future we might utilize an existing model checker to handle the case where A is already finite. To solve the reachability problems of Petri nets, we approximate the number of tokens in each place by an Input: new create,s in *(create?(r).newR {init(read|write)*close }, x in acc(x,init).r!(x)) | *(new r in create!(r) | r?(y).new c in s!(false,y,c) | s!(false,y,c) | c?().c?().acc(y,close)) | *(s?(b,x,r).if b then r!() else acc(x,read).s!(b,x,r)) Output: (*** The result of lock-freedom analysis ***) new create, s in *create??(r). newR {init(read|write)*close}, x in acc(x, init). r!!(x) | *(new r in create!!(r) | r??(y).new c in s!!(false,y,c) | s!!(false,y,c) | c??().c??().acc(y,close)) ... No error found Fig. 5. A Sample Run of the Analyzer 310 N. Kobayashi, K. Suenaga, and L. Wischik element of the finite set {0, 1, 2, “3 or more”}. That approximation reduces Petri nets to finite state machines, so we can use BDD to compute an approximation of the reachable states. Figure 5 shows a part of a successful run of the analyzer. The first process (on the second line) of the input program runs a server, which returns a new, initialized resource. We write ! and ? for output and input actions. The resource access specification is here expressed by a regular expression init(read|write)*close. The second process runs infinitely many client processes, each of which sends a request for a new resource, and after receiving it, reads and closes it. The third process (on the 6th line) is a tail-recursive version of the replicated service in Example 2. Here, a boolean is passed as the first argument of s instead of an integer, as the current system is not adapted to handle integers; it does not affect the analysis, since the system ignores the value and simply inspects both branches of the conditional. Note that the program creates infinitely many resources and has infinitely many states. The first output is the annotated version of the input program produced by TyPiCal, where !! and ?? are an output and an input with the attribute c. 6 Related Work Resource usage analysis and similar analyses have recently been studied extensively, and a variety of methods from type systems to model checking have been proposed [1, 5–7, 9, 16, 20]. However, only a few of them deal with concurrent languages. To our knowledge, none of them deal with the partial liveness property. Nguyen and Rathke [18] propose an effect-type system for a kind of resource usage analysis for functional languages extended with threads and monitors. In their language, neither resources nor monitors can be created dynamically. On the other hand, our target language is π-calculus, so that our type system can be applied to programs that may create infinitely many resources (due to the existence of primitives for dynamic creation of resources: recall the example in Figure 5), and also to programs that use a wide range of communication and synchronization primitives. Model checking technologies [2, 4, 21, 22] can of course be applicable to concurrent languages, but naive applications of model checking technologies would suffer from the state explosion problem, especially for expressive concurrent languages like π-calculus, where resources and communication channels can be dynamically created and passed around. Actually, our type-based analysis can be considered as a kind of abstract model checking. The behavioral types extracted by (the first two steps of) the type inference algorithm are abstract concurrent programs, each of which captures the access behavior on each resource. Then, conformance of the abstract program with respect to the resource usage specification is checked as a model checking problem. From that perspective, a nice point about our approach is that our type, which describes a resource-wise behavior, has much smaller state space than the whole program. In particular, if infinitely many resources are dynamically created, the whole program has in- Resource Usage Analysis for the π-Calculus 311 finite states, but it is often the case that our behavioral types are still finite (indeed so for the example in Figure 5). Technically, closest to our type system are that of Igarashi and Kobayashi [8] and that of Chaki, Rajamani, and Rehof [3]. Those type systems are developed for checking the communication behavior of a process, but by viewing a set of channels as a resource, it is possible to use those type systems directly for the resource usage analysis. As mentioned in Section 1, the main contributions of the present work with respect to those type systems are realization of automatic verification while keeping enough precision, and verification of the partial liveness. The parameterization of the type system with an arbitrary mechanism to guarantee deadlock-freedom opens a new possibility of integrating type-based techniques with other verification techniques (the current implementation uses another type-based analyzer to infer deadlock-freedom, but one can replace that part with a model checker or an abstract interpreter). 7 Conclusion We have presented a type-based technique for verifying resource usage of concurrent programs. Future work includes more serious assessment of the effectiveness of our analysis and extensions of the type system to deal with other typical synchronization primitives like join-patterns and external choice. Acknowledgments We would like to thank Andrew Gordon, Jakob Rehof, and Eijiro Sumii for useful discussions and comments. We would also like to thank anonymous referees for useful comments and suggestions. References 1. T. Ball, B. Cook, V. Levin, and S. K. Rajamani. SLAM and static driver verifier: Technology transfer of formal methods inside microsoft. In Integrated Formal Methods 2004, volume 2999 of LNCS, pages 1–20. Springer-Verlag, 2004. 2. T. Ball and S. K. Rajamani. The SLAM project: Debugging system software via static analysis. In Proc. of POPL, pages 1–3, 2002. 3. S. Chaki, S. Rajamani, and J. Rehof. Types as models: Model checking messagepassing programs. In Proc. of POPL, pages 45–57, 2002. 4. M. Dam. Model checking mobile processes. Information and Computation, 129(1):35–51, 1996. 5. R. DeLine and M. Fähndrich. Enforcing high-level protocols in low-level software. In Proc. of PLDI, pages 59–69, 2001. 6. R. DeLine and M. Fähndrich. Adoption and focus: Practical linear types for imperative programming. In Proc. of PLDI, 2002. 7. J. S. Foster, T. Terauchi, and A. Aiken. Flow-sensitive type qualifiers. In Proc. of PLDI, pages 1–12, 2002. 312 N. Kobayashi, K. Suenaga, and L. Wischik 8. A. Igarashi and N. Kobayashi. A generic type system for the pi-calculus. Theor. Comput. Sci., 311(1-3):121–163, 2004. 9. A. Igarashi and N. Kobayashi. Resource usage analysis. ACM Trans. Prog. Lang. Syst., 27(2):264–313, 2005. Preliminary summary appeared in Proceedings of POPL 2002. 10. N. Kobayashi. Type-based information flow analysis for the pi-calculus. Acta Informatica. to appear. 11. N. Kobayashi. TyPiCal: A type-based static analyzer for the pi-calculus. Tool available at http://www.kb.ecei.tohoku.ac.jp/~ koba/typical/. 12. N. Kobayashi. A partially deadlock-free typed process calculus. ACM Trans. Prog. Lang. Syst., 20(2):436–482, 1998. 13. N. Kobayashi. A type system for lock-free processes. Info. Comput., 177:122–159, 2002. 14. N. Kobayashi, S. Saito, and E. Sumii. An implicitly-typed deadlock-free process calculus. In Proc. of CONCUR2000, volume 1877 of LNCS, pages 489–503. SpringerVerlag, August 2000. 15. N. Kobayashi, K. Suenaga, and L. Wischik. Resource usage analysis for the picalculus. Full version, 2005. http://www.kb.ecei.tohoku.ac.jp/~ koba/papers/ usage-pi.pdf. 16. K. Marriott, P. J. Stuckey, and M. Sulzmann. Resource usage verification. In Proceedings of the First Asian Symposium on Programming Languages and Systems (APLAS 2003), volume 2895 of LNCS, pages 212–229, 2003. 17. R. Milner. Communication and Concurrency. Prentice Hall, 1989. 18. N. Nguyen and J. Rathke. Typed static analysis for concurrent, policy-based, resource access control. draft. 19. J. L. Peterson. Petri Net Theory and the Modeling of Systems. Prentice-Hall, 1981. 20. C. Skalka and S. Smith. History effects and verification. In Proceedings of the First Asian Symposium on Programming Languages and Systems (APLAS 2004), volume 3302 of LNCS, pages 107–128, 2004. 21. B. Victor and F. Moller. The Mobility Workbench — a tool for the π-calculus. In CAV’94: Computer Aided Verification, volume 818 of LNCS, pages 428–440. Springer-Verlag, 1994. 22. P. Yang, C. R. Ramakrishnan, and S. A. Smolka. A logical encoding of the picalculus: Model checking mobile processes using tabled resolution. In Proceedings of VMCAI 2003, volume 2575 of LNCS, pages 116–131. Springer-Verlag, 2003. Semantic Hierarchy Refactoring by Abstract Interpretation Francesco Logozzo1 and Agostino Cortesi2 1 2 École Normale Supérieure, France Università Ca’ Foscari di Venezia, Italy Abstract. A semantics-based framework is presented for the definition and manipulation of class hierarchies for object-oriented languages. The framework is based on the notion of observable of a class, i.e., an abstraction of its semantics when focusing on a behavioral property of interest. We define a semantic subclass relation, capturing the fact that a subclass preserves the behavior of its superclass up to a given (tunable) observed property. We study the relation between syntactic subclass, as present in mainstream object-oriented languages, and the notion of semantic subclass. The approach is then extended to class hierarchies, leading to a semantics-based modular treatment of a suite of basic observablepreserving operators on hierarchies. We instantiate the framework by presenting effective algorithms that compute a semantic superclass for two given classes, that extend a hierarchy with a new class, and that merge two hierarchies by preserving semantic subclass relations. 1 Introduction In the object-oriented paradigm, a crucial role is played by the notion of class hierarchy. Being A a subclass of B captures the fact that the state and the behavior of the elements of A are coherent with the intended meaning of B, while disregarding the additional features and functionalities that characterize the subclass. The approach of mainstream object-oriented languages, like Java and C++, to class hierarchies can be seen as merely syntactic. In such a view hierarchies are collections of classes ordered by the transitive closure of explicitly declared subclass or subtype relations. This is why the main theoretical and practical contributions to hierarchy refactoring issues [32, 33] combine static and dynamic analyses that focus only on syntactic elements. However, as pointed out by [29], this approach has severe limitations, as it leads to troubles when trying to face the issue of extending a given class hierarchy. In this paper we adopt an alternative, semantics-based approach for the definition and manipulation of class hierarchies. It uses previous works on abstract interpretation theory [10], that allows formalizing the notion of different levels of property abstraction and of abstract semantics. This framework is based on the notion of observable of a class, i.e., an abstraction of the class semantics that focuses on a behavioral property of interest. The intuition is that the semantics of a class can be abstracted by parameterizing it with respect to a given domain E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 313–331, 2006. c Springer-Verlag Berlin Heidelberg 2006  314 F. Logozzo and A. Cortesi of observables, and that a notion of semantic subclass can then be defined in terms of preservation of these observables. This notion of semantic subclass can be seen as a proper generalization of the concept of class subtyping, having the advantage of being tunable with respect to a given underlying abstract domain and hence of the properties we are interested to capture. The notion of syntactic subclass, forcing that fields and methods have the same names, is too weak to state something about semantic subclassing, but compatibility results on the syntactic extension on one hand, and suitable renaming functions on the other can be stated that allow us to properly relate the two subclass relations. The interest of the notion of semantic subclass become even more interesting when facing the problem of manipulating class hierarchies which has more than thousands of classes (for instance, NetBeans [27] is made up of 8328 classes). We formalize the notion of semantic ordering of hierarchies as “when is it the case that a hierarchy is more informative with respect to a given observable? ” We show that this notion of semantic subclassing – can be formally related to the traditional syntactic-based subclassing relation; – it is crucial for designing automatic and modular verification tools for polymorphic code; – it enlightens the trade-off between the expressive power of specification languages for object-oriented languages and the subtype relations they support; – it is the base to design algorithms and tools for extending, refactoring and merging class hierarchies. In fact, in the paper we show how it can be used for the automatic and modular verification of polymorphic code, for bounding the expressive power of specification languages for object-oriented languages and for the characterization of semantic class hierarchies. Intuitively, semantic class hierarchies ensure that, up to a given observable, classes lower in the hierarchy specializes the behavior of the upper classes. We instantiate our framework by design algorithms for extending, refactoring and merging class hierarchies. Such algorithms represent the basis for our mid-term goal, that is a tool for the modular verification and the semi-automatic refactoring of large class hierarchies. Paper Structure. In Section 2, an example introduces the main ideas of the paper. In Section 3, the notion of observable is introduced as an abstraction of the concrete semantics. In Section 4, we introduce the semantic subclass relation, we discuss its relationship with the syntactic notion, and we show its use for modular verification of polymorphic code. In Section 5, the framework is lifted to class hierarchies by introducing a suite of refactoring operators. Finally, Section 6 discusses related work, and Section 7 concludes. 2 A Motivating Example Let us consider the five classes described in Fig. 1 that encode different sets of integer numbers. In class Even, variable x can only take even values, whereas variable x Semantic Hierarchy Refactoring by Abstract Interpretation class Integer int x; init(){ x = add() { x += sub() { x -= { class Even { int x; init(){ x = 0 } add() { x += 2 } sub() { x -= 2 } } 0 } 1 } 1 } } class MultEight{ int x; init(){ x = 0 } add() { x += 16 } sub() { x -= 8 } } 315 class Odd { int x; init(){ x = 1 } add() { x += 2 } sub() { x -= 2 } } class MultTwelve{ int x; init(){ x = 0 } add() { x += 24 } sub() { x -= 12 } } Fig. 1. Running examples of Odd takes odd values only. The instance variable of MultEight and MultTwelve can only be assigned a value that is a multiple of 8 and 12, respectively. A first question to address is “What are the admissible hierarchies among such classes? ”. A hierarchy is admissible when the subclasses preserve a given property of their superclass. So, when the parity of the field x is observed, both the class hierarchies H1 and H2 in Fig. 2 are admissible. This is true for H1 , as the value of MultEight.x is always a multiple of 8, and in particular it is even. As a consequence, when just parity is observed, MultEight preserves the behavior of Even. On the other hand, H2 is also an admissible class hierarchy w.r.t. parity as the values taken by MultTwelve.x and MultEight.x are even numbers, too. As a consequence, MultTwelve preserves the parity behavior of its superclass MultEight. Nevertheless, when we consider a more precise property, for instance the value taken by x up to a numerical congruence, then H2 is no longer an admissible hierarchy. In fact, as in general a multiple of 12 is not a multiple of 8, MultTwelve does not preserve the congruence property of its ancestor MultEight. “Why do we need admissible class hierarchies? ” For two reasons: (i) it allows one to design modular verification tools of polymorphic methods, and (ii) it supports design of semantics-preserving operations on class hierarchies. To illustrate (i), consider the class hierarchy H1 and the method inv, defined as follows: Integer 7 O ooo o o o o oo ooo Odd 7 Even O ooo o o o o oo ooo MultEight MultTwelve (a) H 1 , admissible for congruences Integer 9 O eKKK rr KKK r r r KKK rrr K r r MultEight Odd Even O MultTwelve (b) H 2 , admissible for parities Fig. 2. H 1 and H 2 , two possible class hierarchies 316 F. Logozzo and A. Cortesi inv(Even e){return 1/(1 − e.x%2)}. In order to prove that inv never performs a division by zero, it suffices to prove it w.r.t. Even instances. In fact as H1 is admissible for parity, then all the subclasses of Even preserve the property that x is an even number. Nevertheless, in order to prove it correct also for all the future extensions of the hierarchy, we need to assure that all the manipulations on class hierarchies preserve its admissibility. This leads to (ii). This semantic approach can be used to define, and prove correct, manipulating operations on class hierarchies that preserve admissibility w.r.t. a given property. For instance, we will show an algorithm for class insertion. Such an algorithm, when applied to the classes of Fig. 3 and to the hierarchy H1 , returns the hierarchy H3 in Fig. 4, which is still admissible for congruences (and hence parities). As a consequence, the method inv is still guaranteed to be correct for all possible inputs. class MultFour { int x; init() { x = 0 } add() { x += 4 } sub() { x -= 4 }} class MultTwenty { int x; init() { x = 0 } add() { x += 20 } sub() { x -= 60 }} Fig. 3. Two classes to be added to H 1 Integer gOOO p7 p p OOO pp p OOO p p OO ppp Even gN Odd 7 NNN ooo o N o N o NNN ooo NN ooo MultTwenty MultFour 8 gNNN NNN ppp p p NNN p p p NNN ppp MultEight MultTwelve Fig. 4. H 3 : the hierarchy H 1 augmented with MultTwenty and MultFour 3 Concrete and Abstract Semantics of Classes In this section, we introduce the syntax and the concrete semantics of classes. Then, we define the domain of observables and the abstract semantics of a class. 3.1 Syntax A class is a template for objects. It is provided by the programmer who specifies the fields, the methods and the class constructor. Semantic Hierarchy Refactoring by Abstract Interpretation 317 Definition 1 (Classes). A class C is a triple F, init, M where F is a set of distinct variables, init is the class constructor and M is a set of method definitions. The set of all the classes is denoted by C. Like in Smalltalk [15], methods are untyped and fields are private. This is just to simplify the exposition and it does not cause any loss of generality: any external access to a field f can be simulated by a pair of methods set f/ get f. Furthermore, we assume that a class has only one constructor. The generalization to an arbitrary number of constructors is straightforward. The interface of a class is the set of messages it can answer: Definition 2 (Class Interface). Given a class C = init, M, let Mnames be the names of C’s methods. Then the interface of C is ι(C) = {init} ∪ Mnames . 3.2 Concrete Semantics Given a class C = F, init, M, every instance of C has an internal state σ ∈ Σ that is a function from fields to values, i.e., Σ = [F → Dval ], where Dval is the semantic domain of values. When a class is instantiated, the class constructor is called to set the internal state of the new object. This is modeled by a semantic function init ∈ [Dval → P(Σ)]. We consider sets in order to model non-determinism, e.g., user input or random choices. The semantics of a method m is a function m ∈ [Dval × Σ → P(Dval × Σ)]. A method is called with two parameters: the method actual parameters and the internal state of the object it belongs to. The output of a method is a set of pairs  return value (if any), new object state . The most precise state-based property of a class C is the set of states reached by any execution of every instance of C in any possible context. In this paper, we consider just state-based properties. Such an approach can be shown to be an abstraction of a trace-based semantics for object-oriented languages, [23, 22], in which just the states before and after the invocation of a method are retained. The set of states reached by any execution of any instance of a class can be expressed as a least fixpoint on the complete boolean lattice P(Σ), ⊆. The set of the initial states, i.e., the states reached after any invocation of the C constructor, is: S0 = {σ ∈ Σ | ∃v ∈ Dval . σ ∈ init(v)}. The states reached after the invocation of a method m are given by the method collecting forward semantics > m ∈ [P(Σ) → P(Σ)]: >  m(S) = {σ  ∈ Σ | ∃σ ∈ S. ∃v ∈ Dval . ∃v  ∈ Dval . v  , σ   ∈ mv, σ}. The class reachable states are the least solution of the following recursive equations:  S = S0 ∪ Sm m∈M (1) Sm = > m(S) m ∈ M. 318 F. Logozzo and A. Cortesi The above equations characterize the set of states that are reachable before and after the invocation of any method in any instance of the class. Stated otherwise, they consider all the states reached after any possible invocation, in any order, with any input values of the methods of a class. A more general situation, in which the context may update the fields of an object, e.g. , because of aliasing, is considered in [22]. The least solution of (1) w.r.t. set inclusion corresponds to a tuple S, S0 , {m : Sm } such that S is a class invariant [21, 23, 22], and for each method m, Sm is the strongest postcondition of the method. The method preconditions can be obtained by going backward from the postconditions: given a method m and its postcondition, we consider the set of states from which it is possible to reach a state in Sm by an invocation of m. Formally, the collecting backward method semantics < m ∈ [P(Σ) → P(Σ)] is defined as <  m(S) = {σ ∈ Σ | ∃σ  ∈ S. ∃v ∈ Dval . ∃v  ∈ Dval . v  , σ   ∈ mv, σ}. and the methods preconditions are Bm = < m(Sm ). The concrete class semantics, i.e., the most precise property of a class [10], is the triple C = S, S0 , {m : Bm → Sm }. The use of the concrete semantics C for the definition of the observables of a class has two drawbacks. First, in general the computation of the least fixpoint of (1) may be unfeasible and the sets S and Sm and Bm may not be computer-representable. Therefore, this approach is not suitable for an effective definition of semantic subclassing. Second, it is too precise, as it may differentiate classes that do not need to be distinguished. For example, let us consider two classes StackWithList and StackWithArray which implement a stack by using respectively a linked list and a resizable array. Both of them have push and pop methods. If they are observed using the concrete semantics, then the two classes are unrelated, as the internal representation of the stack is different. On the other hand, when the behavior w.r.t. to the invocation of methods is observed, they act in the same way, e.g., no difference can be made between the values returned by the respective pop methods: both of them return the value on the top of the stack. In order to overcome those drawbacks we consider abstract domains that encode the relevant properties and abstract semantics that are feasible, i.e. which are sound, but not necessarily complete, abstractions of the concrete semantics. 3.3 Domain of Observables An observable of a class C is an approximation of its semantics that captures some aspects of interest of the behavior of C. We build a domain of observables starting from an abstraction of sets of object states. Let us consider an abstract domain P, , which is a complete lattice, related to the concrete domain by a Galois connection [10]: γ −− −− P, , ⊥, , , . P(Σ), ⊆, ∅, Σ, ∪, ∩ ← −− α→ (2) Semantic Hierarchy Refactoring by Abstract Interpretation 319 For instance, if we are interested in the linear relations between the values of the fields of the instances of C, we instantiate P with the Octagons abstract domain [26]. On the other hand if we are interested in object aliasing then we are likely to choose for P an abstract domain that captures shapes, e.g.,[30, 31]. [P ] Once P,  is fixed, the abstract domain O[P ], o  of the observables of a class is built on top of it. The elements of the abstract domain belong to the set: O[P ] = {S̄, S̄0 , {m : V̄m , B̄m  → S̄m } | S̄, S̄0 , V̄m , B̄m , S̄m ∈ P }. Intuitively, an element of O[P ] consists of an approximation of the class invariant, the constructor postcondition, and for each method an approximation of its precondition and postcondition. A method precondition is in turn made up of two parts, one for the method input values and the other for that internal object [P ] state. When no ambiguity arises, we write O, o  instead of O[P ], o . We tacitly assume that if a method n is not defined in a class, then its precondition and postconditions are respectively and ⊥. ¯ I¯0 , {mi : Ūi , R̄i  → The partial order o on O is defined point-wise. Let o1 = I, I¯i } and o2 = J¯, J¯0 , {mj : W̄j , Q̄j  → J¯j } be two elements1 of O. Then the order o is defined as: o1 o o2 ⇐⇒ I¯  J¯ ∧ I¯0  J¯0 ∧ (∀mi . W̄i  Ūi ∧ Q̄i  R̄i ∧ I¯i  J¯i ). If o1 and o2 are the observables of two classes A and B then the order o ensures that A preserves the class invariant of B and that the methods of A are a “safe” replacement of those with the same name in B. Intuitively, the precondition generalizes the observations, made in the context of type theory, of [3]. It states two things. First, if the context satisfies W̄i then it satisfies the inherited method precondition Ūi too (i.e., W̄i  Ūi ). Thus the inherited method can be used in any context where its ancestor can. Second, the state of o1 before the invocation of a method must be compatible with that of o2 (i.e., Q̄i  R̄i ). Finally, the postcondition of the inherited method must be at least as strong as that of the ancestor (i.e., I¯i  J¯i ). Having defined o , it is routine to check that ⊥o = ⊥, ⊥, {mi :  ,  → ⊥} is the smallest element of O and o =  , , {mi : ⊥, ⊥ → } is the largest one. The join, o , and the meet, o , operators on O can be defined point-wise. Suppose that the order relation  on P is decidable [28]. This is the case for abstract domains used for effective static analyses. As o is defined in terms of  and the universal quantification ranges on a finite number of methods then o is decidable too. Theorem 1. Let P, , ⊥, , ,  be a complete lattice. Then O, o , ⊥o , o , o , o  is a complete lattice. Moreover, if  is decidable then o is decidable too. From basic abstract interpretation theory [11] we know that A(P(Σ)), the set of all the abstractions of the concrete domain, is a complete lattice ordered w.r.t. 1 We use the same index for methods with the same name. For instance Pi and Qi are the preconditions for the homonym method mi of o1 and o2 . 320 F. Logozzo and A. Cortesi the “relative” precision, ≤, of abstract domains. As immediate consequence, we obtain that Galois connections can be lifted to the domain of observables: Lemma 1. Let P,  and P  ,   be two domains in A(P(Σ)) such that P,  ≤ P  ,   with the Galois connection α, γ. Then, γo  ] ←−−− O[P  ], o[P ]  O[P ], [P o  −− α−→ o where αo and γo are α0 (S̄, S̄0 , {m : V̄m , B̄m  → S̄m } = α(S̄), α(S̄0 ), {m : α(V̄m ), α(B̄m ) → α(S̄m )} γ0 (S̄  , S̄0 , {m : V̄m , B̄m  → S̄m } = γ(S̄  ), γ(S̄0 ), {m : γ(V̄m ), γ(B̄m ) → γ(S̄m )}. 3.4 Abstract Semantics Once the abstract domain is defined, an abstraction of C can be obtained by considering the abstract counterpart for (1). As a first step we need to consider the abstraction corresponding to the initial states, and the forward and the backward collecting semantics. We consider the best abstract counterparts for such concrete semantic functions. By Galois connection properties, the best approximation for the initial states of the class is α(S0 ) = S̄0 . By [11], the best approximation in P of the for¯ > m ∈ [P → P ] defined as ward collecting method semantics of m of C is  > > ¯ m(S̄) = α ◦  m ◦ γ(S̄). The abstract counterpart for the equations (1)  is the following equation system:  S̄ = S̄0  S̄m m∈M (3) ¯ > m(S̄) S̄m =  m ∈ M. The above equations are monotonic and, by the Tarski fixpoint theorem, there exists a least solution S̄, S̄0 , {m : S̄m }. Similarly to the concrete case, the abstract preconditions can be obtained by considering the best approximation ¯ < m ∈ [P → P ] defined as of the backward collecting method semantics  < < ¯  m(S̄) = α ◦  m ◦ γ(S̄). The method abstract preconditions are obtained ¯ < m(S̄m ) respectively on the method input values and the inby projecting  ¯ < m(S̄m )) and B̄m = πF ( ¯ < m(S̄m )). stance fields: V̄m = πin ( ¯ To sum up, the triple C = S̄, S̄0 , {m : V̄m , B̄m  → S̄m } belongs to the domain of observables, and it is the best sound approximation of the semantics of C, w.r.t the properties encoded by the abstract domain P, . Theorem 2 (Observable of a Class). Let P,  be an abstract domain that satisfies (2) and let the observable of a class C w.r.t. the property encoded by ¯ C. ¯ C = S̄, S̄0 , {m : V̄m , B̄m  → S̄m }. Then αo (C)o  P,  be  Example 1. Let us instantiate P,  with Con, the abstract domain of equalities of linear congruences, [16]. The elements of such a domain have the form Semantic Hierarchy Refactoring by Abstract Interpretation 321 x = a mod b, where x is a program variable and a and b are integers. The representation function γc ∈ [Con → P(Σ)] is defined as γc (x = a mod b) = {σ ∈ Σ | ∃k ∈ N. σ(x) = a + k · b}. Let us consider the classes Even and MultEight in Fig. 2 and let e be the property x = 0 mod 2, d the property x = 1 mod 2 and u be the property x = 0 mod 8 . Then the observables of Even and MultEight w.r.t. Con are ¯ Even  = e, e, {add : ⊥, e → e, sub : ⊥, e → e} ¯ Odd = d, d, {add : ⊥, d → d, sub : ⊥, d → d} ¯ MultEight  = u, u, {add : ⊥, u → u, sub : ⊥, u → u}. It is worth noting that as add and sub do not have an input parameter, the corresponding precondition for the input values is ⊥.   4 Subclassing The notion of subclassing can be defined both at semantic and syntactic level. Given two classes A and B, A is a syntactic subclass of B, denoted A  B, if all the names defined in B are defined in A too. On the other hand, A is a semantic subclass of B, denoted A  B, if A preserves the observable of B. The notion of semantic subclassing is useful for exploring the expressive power of specification languages and the modular verification of object-oriented programs. 4.1 Syntactic Subclassing The intuition behind the syntactic subclassing relation is inspired by the Smalltalk approach to inheritance: a subclass must answer to all the messages sent to its superclass. Stated otherwise, the syntactic subclassing relation is defined in terms of inclusion of class interfaces: Definition 3 (Syntactic Subclassing). Let A and B be two classes, ι(·) as in Def. 2. Then the syntactic subclass relation is defined as A  B ⇐⇒ ι(A) ⊇ ι(B). It is worth noting that as ι(·) does not distinguish between names of fields and methods, class A = ∅, init, f = λx.x + 1 is a syntactic subclass of B =f, init, ∅, even if in the first case f is a name of a method and in the second it is the name of a field. This is not surprising in the general, untyped, context we consider. Example 2. In mainstream object-oriented languages the subclassing mechanism is provided through class extension. For example, in Java a subclass of a base class B is created by using the syntactic construct “A extends B { extension }”, where A is the name of the subclass and extension are the fields and the methods added and/or redefined by the subclass. As a consequence, if type declarations are considered part of the fields and method names, then A  B always holds.   322 4.2 F. Logozzo and A. Cortesi Semantic Subclassing The semantic subclassing relation formalizes the intuition that up-to a given property, a class A behaves like a class B. For example, if the property of interest is the type of the class, then A is a semantic subclass of B if its type is a subtype of B. In our framework, semantic subclassing can be defined in terms of the preservation of observables. In fact, as o is the abstract counterpart for the ¯ B means that A preserves the semantics of B, ¯ Ao  logical implication then  when a given property of interest is observed. Therefore we can define Definition 4 (Semantic Subclassing). Let O, o  be an abstract domain of observables and let A and B be two classes. Then the semantic subclassing relation ¯ B. ¯ Ao  with respect to O is defined as A O B ⇐⇒  Example 3. Let us consider the classes Even, Odd and MultEight and their respective observables as in Ex. 1. Then, as u  e holds, we have that MultEight Even. On the other hand, we have that neither e  d nor d  e. As a consequence, Even  Odd and Odd  Even.   Observe that when O, o  is instantiated with the types abstract domain [9] then the relation defined above coincides with the traditional subtyping-based definition of subclassing [2]. / Semantics BO B  A Semantics / A Abstract Semantics Abstract Semantics / ¯ B  o / ¯ A Fig. 5. A visualization of the semantic subclassing relation The relation between classes, concrete semantics and observables can be visualized by the diagram in Fig. 5. When the abstract semantics of A and B are compared, that of A implies the one of B. This means that A refines B w.r.t. the properties encoded by the abstract domain O, in accord with the mundane approach of inheritance where a subclass is as a specialization of its ancestors [25]. The next lemma states the monotonicity of  w.r.t. the observed properties: Lemma 2. Let A and B be classes, P,  and P  ,   be abstract domains in A(P(Σ)) such that P,  ≤ P  ,  . If A O[P ] B then A O[P  ] B. By Lemma 2, the more precise the domain of observables, the more precise the induced subclass relation. If we observe a more precise property about the class semantics then we are able to better distinguish between the different classes. Example 4. Let us consider the hierarchies H1 and H2 depicted in Fig. 2. As the domain of congruences is (strictly) more precise than the domain of parities, H1 is also admissible for parities, by Lemma 2. Observe that in general the converse is not true: for instance H2 is not admissible for congruences.   Semantic Hierarchy Refactoring by Abstract Interpretation 323 When considering the identity Galois connection λx. x, λx. x, Def. 4 above boils down to the observation of the concrete semantics, so that by Lemma 2, O[P(Σ)] is the most precise semantic subclassing relation. Furthermore, the semantic subclass relation induced by the most abstract domain is the trivial one, in which all classes are in relation with all others. As a consequence, given two classes A and B there always exist an abstract domain of observables O such that A O B. However, in general there not exists a least domain of observables such that the two are in the semantic subclass relation, as shown by the following example: Example 5. Let us consider two classes A and B that are equal except for a method m defined as: A.m() { B.m() { x = 1; y = 2; x = 1; y = 2; if (x > 0) && (y % 2 == 0) { if (x > 0) && (y % 2 == 0) { x = 1; y = 4; } x = 1; y = 2; } else { else { x = 1; y = 8; }} x = 3; y = 10; }} When considering the domain of intervals [10] as observables, we infer that A Intervals B as ([1, 1], [4, 8])  ([1, 3], [2, 10]) and when considering the domain of parities as observables, we infer that A Parities B as (odd, even)  (odd, even). In fact, in both cases the abstract domain is not precise enough to capture the branch chosen by the conditional statement. Nevertheless, when considering the reduced product, [11], Intervals × Parities we have that A Intervals×Parities B as (([1, 1], odd), ([4, 4], even))  (([1, 1], odd), ([2, 2], even)). As a consequence, if there exists a least domain O such that A O B, then O should be strictly smaller than both Intervals and Parities as the two domains are not comparable. Then, O must be smaller or equal to the reduced product of the two domains. We have just shown that it cannot be equal. By Lemma 2 it follows that it cannot be smaller, too.   Observation. The previous example emphasizes a strict link between the concept of subtyping in specification languages for object-oriented programs and the notion of abstraction. Let us consider two classes A and B, two specification languages L1 and L2 , and the strongest properties we can express about the behavior of A and B in L1 and in L2 , say respectively ϕA1 , ϕB1 and ϕA2 , ϕB2 . Let us suppose that ϕA1 ⇒ ϕB1 and ϕA2 ⇒ ϕB2 . By definition of behavioral subtyping, [18], A is a subclass of B in both L1 and in L2 . Nevertheless, by Ex. 5, by the definition of observable of a class, and by basic abstract interpretation theory [8], it follows that if we consider a specification language L3 expressive enough to contain both L1 and L2 , and the corresponding strongest properties ϕA3 , ϕB3 , then ϕA3 ⇒ ϕB3 . This means that when a more expressive language is used then the classes A and B are no more related. This fact enlightens an interesting trade-off between the expressive power of specification languages for object-oriented programs and the granularity of the subtyping relation they support. 324 4.3 F. Logozzo and A. Cortesi Modular Verification of Polymorphic Code Thanks to the following lemma, the notion of semantic subclass turns out to be useful for the modular verification of polymorphic code: Lemma 3 (Modular Verification). Let P,  be an abstract domain, let g ∈ [O[P ] → P ] be a monotonic function and let f ∈ [C → P ] be defined as f = ¯ B). Then, A O[P ] B implies that f (A)  f (B). λB. g( Let us consider a function “m(B b) { bodym }”. The best abstract semantics,[11], of m w.r.t. a given abstract domain P,  is ¯m 2 . By Galois connection properties, ¯m is a monotonic function. Let o ∈ O[P ]. We define g as the function obtained from ¯m by replacing each occurrence of an invocation of a method of b, e.g. b.n(...), inside bodym with the corresponding preconditions and postconditions of o [14]. We denote it with m[b → o]. Hence, g = λo. ¯m[b → o] is ¯ B) as  ¯ B is an approxa monotonic function, and in particular ¯m  g( imation of the behavior of b in all the possible contexts. Then, we can apply ¯ A)  g( ¯ B). Lemma 3 so that for every class A, A O[P ] B, we have that g( ¯ As a consequence, if we can prove that g(B)  S̄ for a given specification S̄, by ¯ A)  S̄, for every semantic subclass A O[P ] B, transitivity, it follows that g( i.e., m is correct w.r.t. the specification S̄. Example 6. Consider the function inv in Sect. 2. We want to prove the property that inv never performs a division by zero. Let us instantiate P with the parity abstract domain. By Ex. 1 we know that x = e. By an abstract evaluation of the return expression, one obtains 1/(1 − e%2) = 1/d, that is always defined (as obvisiously zero is not an odd number). As a consequence, when an instance of Even is passed to inv, it does not throw any division-by-zero exception. Furthermore, for what said above, this is true for all the semantic subclasses of Even.   4.4 Relation Between  and  Consider two classes A and B such that A  B. By definition, this means that all the names (fields or methods) defined in B are defined in A too. In general, such a condition is too weak to state something “interesting” about the semantics of A w.r.t. that of B: as seen before, there exists a domain of observables O such that A O B, and in most cases such a domain is the most abstract one, and by Lemma 2 this implies that  is a uninteresting relation. Therefore, in order to obtain more interesting subclass relations, we have to consider some hypotheses on the abstract semantics of the methods of the class. If the constructor of a class A is compatible with that of B, and if the methods of A do not violate the class invariant of B, then A is a semantic subclass of B. On the other hand, semantic subclassing almost implies syntactic subclassing. This is formalized by the following theorems [24]: 2 We consider the best abstract function in order to simplify the exposition. Nevertheless the paper’s results still hold when a generic upper-approximation is considered. Semantic Hierarchy Refactoring by Abstract Interpretation 325 Theorem 3. Let A = FA , initA, MA  and B = FB , initB, MB  be two classes such that A  B, and let P,  ∈ A(P(Σ)). If (i) IB is a class invariant for B, (ii) ¯ > initA   ¯ > initB, (iii) ∀S̄ ∈ P. ∀m ∈ MA ∩ MB .  ¯ > m(S̄)  IB and (iv)  ¯ > m(S̄)  IB then A O[P ] B. ∀m ∈ MA . m ∈ MB =⇒  Theorem 4. Let A, B ∈ C, such that A O B. Then there exists a renaming function φ such that φ(A)  B. 5 Meaning-Preserving Manipulation of Class Hierarchies In this Section, we exploit the results of the previous sections to introduce the concept of admissible class hierarchy, and to define and prove correct some operators on class hierarchies. 5.1 Admissible Semantic Class Hierarchies For basic definitions on trees, the reader may refer to [6]. If T is a tree, nodesOf(T ) denotes the elements of the tree, rootOf(T ) denotes the root of the tree, and if n ∈ nodesOf(T ) then sonsOf(n) are the successors of the node n. In particular, if sonsOf(n) = ∅ then n is a leaf. A tree with a root r and successors S is tree(r, S). Here we only consider single inheritance so that class hierarchies are trees of classes. An admissible hierarchy w.r.t. a transitive relation ρ on classes is a tree such that all the nodes are classes, and given two nodes n and n such that n ∈ sonsOf(n) then n is in the relation ρ with n. Formally: Definition 5 (Admissible Class Hierarchy). Let H be a tree and ρ ⊆ C × C be a transitive relation on classes. Then we say that H is a class hierarchy which is admissible w.r.t. ρ, if (i) nodesOf(H) ⊆ C, and (ii) ∀n ∈ nodesOf(H). ∀n ∈ sonsOf(n).n ρn. We denote the set of all the class hierarchies admissible w.r.t. ρ as H[ρ]. It is worth noting that our definition subsumes the definition of class hierarchies of mainstream object-oriented languages. In fact, when ρ is instantiated with , we obtain class hierarchies in which all the subclasses have at least the same methods as their superclass. A semantic class hierarchy is just the instantiation of the Def. 5 with the relation . The theorems and lemmata of the previous sections can be easily lifted to class hierarchies: Example 7. Consider the two hierarchies in Fig. 2. H1 is admissible w.r.t. Con and H2 is admissible w.r.t. Parities but not w.r.t. Con .   In order to manipulate hierarchies we wish to preserve admissibility. This is why we need the notion of a fair operator. A fair operator on class hierarchies transforms a set of class hierarchies admissible w.r.t. a relation ρ into a class hierarchy that is admissible w.r.t. a relation ρ . Definition 6 (Fair Operator). Let ρ and ρ be transitive relations. Then we say that a function t is a fair operator w.r.t. ρ and ρ if t ∈ [P(H[ρ]) → H[ρ ]]. In the following, when not stated otherwise, we assume that ρ = ρ =. 326 5.2 F. Logozzo and A. Cortesi Class Insertion The first fair operator we consider is the one for adding a class into an admissible class hierarchy. The algorithm definition of such an operator is presented in Fig. 6. It uses as sub-routine CSS, an algorithm for computing a common semantic superclass of two given classes, that is depicted in Fig. 7 and discussed in the next section. The insertion algorithm takes as input an admissible class hierarchy H and a class C. Four cases are distinguished. (i) if C already belongs to H then the hierarchy keeps unchanged. (ii) If C is a superclass of the root of H, then a new class hierarchy whose root is C is returned. (iii) If C is a subclass of the root of H, then the insertion must preserve the admissibility of the hierarchy. If C is a superclass of some of the successors, then it is inserted between the root of H and such successors. Otherwise it checks whether some root class of the successors is a superclass of C. If it is the case, then the algorithm is recursively applied, otherwise C is added at this level of the hierarchy. (iv) If C and the root of H H  C  let R = rootOf(H), S = sonsOf(R) let H < = {K ∈ S | rootOf(K)  C} let H > = {K ∈ S | rootOf(K)  C} if C ∈ nodesOf(H) then return H if R C then return tree(C, R) if C R then if H < = ∅ then return tree(R, (S-H < ) ∪ tree(C, H < )) if H > = ∅ then select K ∈ S return tree(R, (S-K) ∪ ( K C)) else return tree(R, S ∪ {C}) else select C = CSS(R, C) return tree(C , {R, C}) Fig. 6. The algorithm for a fair class insertion CSS(A, B)  let A = FA , initA , MA , B = FB , initB , MB , FC = ∅, initC = initA , MC = ∅ repeat select f ∈ FA − FC if B FC ∪ {f}, τFC ∪{f} (initA ), τFC ∪{f} (MC ) then FC = FC ∪ {f}, initC = τFC ∪{f} (initA )  select m ∈ MA − MC if B FC , initC , τFC (MC ∪ {m}) then MC = MC ∪ {m} until no more fields or methods are added return FC , initC , τFC (MC ) Fig. 7. Algorithm for computing the CSS Semantic Hierarchy Refactoring by Abstract Interpretation 327 are unrelated, the algorithm returns a new hierarchy whose root is a superclass of both C and the root of H. The soundness of the algorithm follows from the observation that, if in the input hierarchy there is an admissible path from a class B to a class A, then in the extended hierarchy there still exists an admissible path from B to A. Lemma 4 (Soundness of , [24]). The operator  defined in Fig. 6 is a fair operator w.r.t. , i.e.,  ∈ [H[] × C → H[]]. Example 8. Consider the hierarchy H1 and the classes MultFour and MultTwenty of Sect.2. (H1  MultFour)  MultTwenty = H3 of Fig.4.   Because of Th. 1, the algorithm  is effective as soon as the underlying domain of observables is suitable for a static analysis, i.e. the abstract elements are computer representable, the order on P is decidable, and a widening operator ensures the convergence of the fixpoint computation. The dual operator, i.e., the elimination of a class from a hierarchy, corresponds straightforwardly to the algorithm for removing a node from an ordered tree [6]. 5.3 Common Semantic Superclass From the previous section we were left to define (and prove correct) the algorithm that returns the common semantic superclass (CSS) of two given classes. First we recall the definition of meaning-preserving transformation τ [12]: Definition 7 (Program Transformation). Let A = F, init, M and α, γ a Galois connection satisfying (2). A meaning-preserving program transformation τ ∈ [F → M → M] is such that ∀f ∈ F. ∀m ∈ M: (i) τf (m) does not contain the field f and (ii) ∀d̄ ∈ P. α(> m(γ(d̄))  α(> τf (m)(γ(d̄))). Intuitively, τf (m) projects out the field f from the source of m preserving the semantics up to an observation (i.e., α). The algorithm CSS is presented in Fig. 7. It is parameterized by the underlying abstract domain of observables and a meaning preserving map τ . The algorithm starts with a superclass for A (i.e., ∅, initA, ∅). Then, it iterates by non-deterministically adding, at at each step, a field or a method of A: if such an addition produces a superclass for B then it is retained, otherwise it is discarded. When no more methods or fields can be added, the algorithm returns a semantic superclass for A and B, as guaranteed by the following theorem: Theorem 5 (Soundness of CSS). Let A and B be two classes. Then CSS(A,B) is such that A  CSS(A,B) and B  CSS(A,B). It is worth noting that in general, CSS(A,B) = CSS(B,A). Furthermore, by Th. 1, it follows that if  is decidable, then the algorithm is effective. This is the case when the underlying abstract domain of observables corresponds to one used for a static analysis [20]. Example 9. Consider the classes MultEight and MultTwelve and MultFour defined as in Sect. 2. When using the abstract domain of linear congruences, CSS(MultEight,MultTwelve) = MultFour.   328 F. Logozzo and A. Cortesi 5.4 Merging of Hierarchies The last refactoring operation on hierarchies we consider is about merging. The algorithm  can be used as a basis for the algorithm to merge two admissible class hierarchies: H 1 H 2  let H = H 1 , N = nodesOf(H 2 ) while N = ∅ do select C ∈ N H = H C, N = N − C return H. Lemma 5. is a fair operator w.r.t. , i.e.,  ∈ [H[] → H[]]. It is worth mentioning that the modularity and modulability of the operators described in this section are the crucial keys that allow to apply them also to “real world ” hierarchy management issues [33]. 6 Related Work In their seminal work on Simula [13], Dahl and Nygaard justified the concept of inheritance on syntactic bases, namely as textual concatenation of program blocks. A first semantic approach is [15] an (informal) operational approach to the semantics of inheritance is introduced. In particular the problem of specifying the semantics of message dispatch is reduced to that of method lookup. In [5] a denotational characterization of inheritance is introduced and proved correct w.r.t. an operational semantics based on the method lookup algorithm of [15]. An unifying view of the different forms of inheritance provided by programming languages is presented in [1]. In the objects as records model [2], the semantics of an object is abstracted with its type: inheritance is identified with subtyping. Such an approach is not fully satisfactory as shown in [4]. The notion of subtyping has been generalized in [18] where inheritance is seen as property preservation: the behavioral type of a class is a human-provided formula, which specifies the behavior of the class, and subclassing boils down to formula implication. The main difference between our concept of observable and that of behavioral type is that observables are systematically obtained as an abstraction of the class semantics instead of being provided by the programmer. As for class hierarchies refactoring, [32] presents a semantics-preserving approach to class composition. Such an approach preserves the behavior of the composing hierarchies when they do not interfere. If they do interfere, a static analysis determines which components (classes, methods, etc.) of the hierarchies may interfere, given a set of programs that use such hierarchies. Such an approach is the base of the [33], which exploits static and dynamic information for class refactoring. The main difference between these works and ours is that we exploit the notion of observable, which is a property valid for all the instantiation contexts of a class. As a consequence we do not need to rely on a set of test programs for inferring hierarchy properties. Furthermore, as a soundness Semantic Hierarchy Refactoring by Abstract Interpretation 329 requirement, we ask that a refactoring operator on a class hierarchy preserve the observable, i.e., an abstraction of the concrete semantics. As a consequence we are in a more general setting, and the traditional one is recovered as soon as we consider the domain of observables to be the concrete one. 7 Conclusions and Future Work We introduced a framework for the definition and the manipulation of class hierarchies based on semantics abstraction. The main novelty of this approach is twofold: it provides a logic-based solid foundation of class refactoring operations that are safe by construction, and allows us to tune it according to the observed property. The next goal is the development of a tool for the semi-automatic refactoring of class hierarchies, based on [19, 21], and the design of abstract domains capturing properties expressible in JML [17]. Acknowledgments Thanks to Radhia Cousot for her kind support. This work was partly supported by École Polytechnique, and by MIUR projects COFIN 2004013015 - FIRB RBAU018RCZ. References 1. G. Bracha and W. R. Cook. Mixin-based inheritance. In 5th ACM Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA ’90), volume 25(10) of SIGPLAN Notices, pages 303–311, October 1990. 2. L. Cardelli. A semantics of multiple inheritance. In G. Kahn, D. MacQueen, and G. Plotkin, editors, Semantics of Data Types, volume 173 of Lecture Notes in Computer Science, pages 51–67, Berlin, 1984. Springer-Verlag. Full version in Information and Computation, 76(2/3):138–164, 1988. 3. G. Castagna. Covariance and contravariance: conflict without a cause. ACM Transactions on Programming Languages and Systems, 17(3):431–447, March 1995. 4. W. R. Cook, W. Hill, and P. S. Canning. Inheritance is not subtyping. In Proceedings of the 17th annual ACM SIGPLAN-SIGACT Symposium on Principles of programming languages (POPL’90). ACM Press, January 1990. 5. W. R. Cook and J. Palsberg. A denotational semantics of inheritance and its correctness. Information and Computation, 114(2):329–350, November 1994. 6. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stei. Introduction to Algorithms, Second Edition. The MIT Press and McGraw-Hill Book Company, 2001. 7. A. Cortesi and F. Logozzo. Abstract interpretation-based verification of non functional requirements. In Proceedings of the 7th International Conference on Coordination Models and Languages (COORD’05), volume 3654 of Lectures Notes in Computer Science, pages 49–62. Springer-Verlag, April 2005. 8. P. Cousot. Methods and logics for proving programs. In J. van Leeuwen, editor, Formal Models and Semantics, volume B of Handbook of Theoretical Computer Science, chapter 15, pages 843–993. Elsevier Science, 1990. 330 F. Logozzo and A. Cortesi 9. P. Cousot. Types as abstract interpretations, invited paper. In 24th ACM Symposium on Principles of Programming Languages (POPL ’97), pages 316–331. ACM Press, January 1997. 10. P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In 4th ACM Symposium on Principles of Programming Languages (POPL ’77), pages 238–252. ACM Press, January 1977. 11. P. Cousot and R. Cousot. Systematic design of program analysis frameworks. In 6th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’79), pages 269–282. ACM Press, 1979. 12. P. Cousot and R. Cousot. Systematic design of program transformation frameworks by abstract interpretation. In 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’02), pages 178–190. ACM Press, New York, NY, January 2002. 13. O. Dahl and K. Nygaard. SIMULA - an ALGOL-based simulation language. Communications of the ACM (CACM), 9(9):671–678, September 1966. 14. D. L. Detlefs, K. Rustan M. Leino, G. Nelson, and Saxe J.B. Extended static checking. Research Report #159, Compaq Systems Research Center, Palo Alto, USA, December 1998. 15. A. Goldberg and D. Robson. Smalltalk-80: The Language and Its Implementation. Addison-Wesley, 1983. 16. P. Granger. Static analysis of linear congruence equalities among variables of a program. In International Joint Conference on Theory and Practice of Software Development (TAPSOFT’91), volume 464 of Lectures Notes in Computer Science, pages 169–192. Springer-Verlag, April 1991. 17. G. T. Leavens, A. L. Baker, and C. Ruby. Preliminary Design of JML: A Behavioral Interface Specification Language for Java, November 2003. 18. B. H. Liskov and J. M. Wing. A behavioral notion of subtyping. ACM Transactions on Programming Languages and Systems, 16(6):1811–1841, November 1994. 19. F. Logozzo. Class-level modular analysis for object oriented languages. In Proceedings of the 10th Static Analysis Symposium 2003 (SAS ’03), volume 2694 of Lectures Notes in Computer Science, pages 37–54. Springer-Verlag, June 2003. 20. F. Logozzo. An approach to behavioral subtyping based on static analysis. In Proceedings of the International Workshop on Test and Analysis of Component Based Systems (TACoS 2004), Electronic Notes in Theoretical Computer Science. Elsevier Science, April 2004. 21. F. Logozzo. Automatic inference of class invariants. In Proceedings of the 5th International Conference on Verification, Model Checking and Abstract Interpretation (VMCAI ’04), volume 2937 of Lectures Notes in Computer Science, pages 211–222. Springer-Verlag, January 2004. 22. F. Logozzo. Modular Static Analysis of Object-oriented languges. PhD thesis, École Polytecnique, 2004. 23. F. Logozzo. Class invariants as abstract interpretation of trace semantics. Computer Languages, Systems and Structures, 2005 (to appear). 24. F. Logozzo and A. Cortesi. Semantic class hierarchies by abstract intepretation. Technical Report CS-2004-7, Dipartimento di Informatica, Università Ca’ Foscari di Venezia, Italy, 2004. 25. B. Meyer. Object-Oriented Software Construction (2nd Edition). Professional Technical Reference. Prentice Hall, 1997. 26. A. Miné. The octagon abstract domain. In AST 2001 in WCRE 2001, IEEE, pages 310–319. IEEE CS Press, October 2001. Semantic Hierarchy Refactoring by Abstract Interpretation 331 27. NetBeans.org and Sun Mycrosystem, Inc. Netbeans IDE, 2004. 28. P. Odifreddi. Classical Recursion Theory. Elsevier, Amsterdam, 1999. 29. J. Palsberg and M.I. Schwartzbach. Object-Oriented Type Systems. John Wiley & Sons, Chichester, 1994. 30. I. Pollet, B. Le Charlier, and A. Cortesi. Distinctness and sharing domains for static analysis of Java programs. In Proceedings of the European Conference on Object Oriented Programming (ECOOP ’01), volume 2072 of Lectures Notes in Computer Science, pages 77–98. Springer-Verlag, 2001. 31. S. Sagiv, T.W. Reps, and R. Wilhelm. Parametric shape analysis via 3-valued logic. ACM Transactions on Programming Languages and Systems, 24(3):217–288, 2002. 32. G. Snelting and F. Tip. Semantics-based composition of class hierarchies. In Proceedings of the 16th European Conference on Object-Oriented Programming (ECOOP’02), volume 2374 of Lectures Notes in Computer Science, pages 562–584. Springer-Verlag, June 2002. 33. M. Streckenbach and G. Snelting. Refactoring class hierarchies with KABA. In Proceedings of the 19th ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’04). ACM Press, 2004. Strong Preservation of Temporal Fixpoint-Based Operators by Abstract Interpretation Francesco Ranzato and Francesco Tapparo Dipartimento di Matematica Pura ed Applicata, Università di Padova, Italy Abstract. Standard abstract model checking relies on abstract Kripke structures which approximate the concrete model by gluing together indistinguishable states. Strong preservation for a specification language L encodes the equivalence of concrete and abstract model checking of formulas in L. Abstract interpretation allows to design abstract models which are more general than abstract Kripke structures. In this paper we show how abstract interpretation-based models can be exploited in order to specify a general strongly preserving abstract model checking framework. This is shown in particular for specification languages including standard temporal operators which admit a characterization as least/greatest fixpoints, as e.g. standard “Finally”, “Globally”, “Until” and “Release” modalities. 1 Introduction Abstract model checking is one successful and practical way to deal with the wellknown state explosion problem of model checking in system verification [1, 3]. Standard abstract model checking [2] relies on abstract models which are based on partitions of the state space. Given a concrete model as a Kripke structure K = (Σ, →), a standard abstract model is specified by an abstract Kripke structure A = (A, → ) where the set A of abstract states is defined by a surjective map h : Σ → A and → is an abstract transition relation on A. Thus, A determines a partition PA of Σ and vice versa. A weak preservation result for some temporal language L guarantees that for any formula ϕ ∈ L, if ϕ holds on the abstract model A then ϕ also holds on the concrete model K. On the other hand, strong preservation means that any formula of L holds on A if and only if it holds on K. Strong preservation is highly desirable since it allows to draw consequences from negative answers on the abstract side [3]. Thus, in order to design a standard abstract model we need both an appropriate partition of the space state and a suitable abstract transition relation. The relationship between abstract interpretation and abstract model checking has been the subject of a number of works (see e.g. [2, 6, 7, 9, 10, 11, 15, 16, 19, 18]). We introduced in [17] an abstract interpretation-based framework for specifying generic strongly preserving abstract models, where a partition of the state space Σ is viewed as a particular abstract domain of the powerset ℘(Σ), where ℘(Σ) plays the role of concrete semantic domain. This generalized approach leads to a precise correspondence between forward complete abstract interpretations and strongly preserving abstract models. We deal with generic (temporal) languages L of state formulas which are inductively generated by a set AP of atomic propositions p and a set Op of operators f , i.e. L  ϕ ::= p | f (ϕ1 , ..., ϕn ). A semantic interpretation p ⊆ Σ of atomic E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 332–347, 2006. c Springer-Verlag Berlin Heidelberg 2006  Strong Preservation of Temporal Fixpoint-Based Operators by Abstract Interpretation 333 propositions and of operators f : ℘(Σ)n → ℘(Σ) determines a concrete semantic function [[·]] : L → ℘(Σ) where [[p]] = p and [[f (ϕ1 , ..., ϕn )]] = f ([[ϕ1 ]], ..., [[ϕn ]]). Thus, any abstract domain A of ℘(Σ) and corresponding abstract interpretation p ∈ A and f  : An → A for constants/operators, denoted by I  , induce an abstract semantic function [[·]]A : L → A where [[p]]A = p and [[f (ϕ1 , ..., ϕn )]]A = f  ([[ϕ1 ]]A , ..., [[ϕn ]]A ). In particular, the abstract interpretation of p and f can be given as their best correct def def approximations on A, i.e. pA = α(p) and f A = α ◦ f ◦ γ where α and γ are the abstraction and concretization maps relating A to ℘(Σ). In this generalized setting, strong preservation goes as follows: the abstract interpretation (A, I  ) is strongly preserving for L when for any S ⊆ Σ and ϕ ∈ L, S ⊆ [[ϕ]] ⇔ α(S) ≤ [[ϕ]]A . When A is an abstract domain representing a partition of Σ, this boils down to standard strong preservation for abstract Kripke structures, where different choices for the abstract transition relation  correspond to different abstract interpretations of the operators f . It turns out that forward completeness implies strong preservation, i.e. if the abstract domain A is forward complete for the concrete constants/operators of L — this means that no loss of precision occurs by approximating each p and f on the abstract domain A — then A is strongly preserving for L. The converse is in general not true. However, we show that when A is L-covered — meaning that each abstract value a ∈ A corresponds to some formula ϕ ∈ L, i.e. γ(a) = [[ϕ]] — forward completeness and strong preservation are indeed equivalent notions and consequently the abstract interpretation of constants/operators of L as best correct approximations on A is the only possible choice in order to have strong preservation. One interesting point to remark is that when the abstract domain is a state partition P , an abstract transition relation  on P such that the abstract Kripke structure (P,  ) strongly preserves L might not exist, while, in contrast, a strongly preserving abstract semantics on the partition P viewed as an abstract domain always exists. The abstract semantics is therefore defined by approximating the interpretation of logical/temporal operators of L through their best correct approximations on the abstract domain A. In principle, this can be done for any logical/temporal operator. However, when a temporal operator f can be expressed as a least/greatest fixpoint of another temporal operator g, e.g. f = λX. lfp(λY.g(X, Y )), the best correct approximation α ◦ f ◦ γ might not be characterizable as a least/greatest fixpoint. For example, the existential “Finally” operator can be characterized as a least fixpoint by EF(X) = lfp(λY.X ∪ EX(Y )), where EX = pre is the standard predecessor transformer on the concrete Kripke structure. The best correct approximation of EF on an abstract domain A is therefore the abstract function α ◦ EF ◦ γ : A → A. However, this definition gives us no clue for computing α ◦ EF ◦ γ as a least fixpoint. By contrast, in standard abstract model checking the abstract interpretation of language operators is based on an abstract Kripke structure A = (P,  ), so that it is enough to compute the least fixpoint lfp(λY  .X  ∪ EX (Y  )) on the abstract state space P , namely X  and Y  are sets of blocks in P , ∪ is union of sets of blocks and EX = pre is the predecessor transformer on A. For example, for the language L  ϕ ::= p | ϕ1 ∧ ϕ2 | EFϕ if one can define a strongly preserving abstract Kripke structure (P,  ), where P is some partition of Σ, then the abstract Kripke structure (P, ∃∃ ) strongly preserves L as well, where B1 ∃∃ B2 iff ∃s1 ∈ B1 .∃s2 ∈ B2 .s1 s2 . 334 F. Ranzato and F. Tapparo In this case, while the concrete fixpoint is given by EF(X) = lfp(λY.X ∪ pre (Y )), the abstract fixpoint is EX (X  ) = lfp(λY  .X  ∪ pre∃∃ (Y  )). The key point here is that the best correct approximation of the concrete function λ X, Y . X ∪ pre (Y ) on the partition P viewed as an abstract domain is indeed λ X  , Y  . X  ∪ pre∃∃ (Y  ). In other terms, the best correct approximation of λX. lfp(λY.X ∪ pre (Y )) can be expressed as λX  . lfp(λY  .X  ∪ pre∃∃ (Y  )) and thus preserves the same “template” of the concrete fixpoint function. We generalized this phenomenon to generic functions and abstract domains and then applied to standard temporal operators which can be expressed as fixpoints, that is, “Finally”, “Globally”, “Until” and “Release” modalities. We applied our results both to partitions, namely standard abstract models, and to disjunctive abstract domains, namely domains which are able to represent precisely logical disjunction. As far as partitions are concerned, we obtained new results of strong preservation on standard abstract Kripke structures. On the other hand, applications to disjunctive abstract domains provide a new procedure to perform a strongly preserving abstract model checking. This latter approach seems especially interesting because examples hint that efficient implementations are feasible. 2 Background Notation. The standard pointwise ordering between functions will be denoted by . For a set S ∈ ℘(℘(X)), we write the sets in S in a compact form like in {[1], [12], [123]} ∈ ℘(℘({1, 2, 3})). We denote by  the complement operator w.r.t. some universe set. Part(Σ) denotes the set of partitions of Σ. We consider transition systems (Σ, R) R →) is total. A Kripke structure where the relation R ⊆ Σ × Σ (also denoted by − K = (Σ, R, AP, ) consists of a transition system (Σ, R) together with a set AP of atomic propositions and a labelling function  : Σ → ℘(AP ). Paths in K are def R →π defined by Path(K) = {π : N → Σ | ∀i ∈ N. πi − i+1 }. A transition relation R ⊆ Σ × Σ defines the usual pre/post transformers on ℘(Σ): preR , postR , pre  R,  R . When clear from the context, subscripts R are sometimes omitted. The relations post R∃∃ , R∀∃ ⊆ ℘(Σ) × ℘(Σ) are defined as follows: (S1 , S2 ) ∈ R∃∃ (respectively, R∀∃ ) iff ∃s1 ∈ S1 . (respectively, ∀s1 ∈ S1 .) ∃s2 ∈ S2 . (s1 , s2 ) ∈ R. Abstract Interpretation and Completeness. As usual in standard abstract interpretation, abstract domains are specified by Galois connections/insertions (GCs/GIs) [4, 5]. A GC/GI of the abstract domain A into the concrete domain C through the abstraction and concretization maps α : C → A and γ : A → C will be denoted by (C, α, γ, A). GIs of a common concrete domain C are pre-ordered w.r.t. precision as usual: G1 = (C, α1 , γ1 , A1 ) G2 = (C, α2 , γ2 , A2 ) (i.e., A1 is more precise than A2 ) iff γ1 ◦ α1 γ2 ◦ α2 . Moreover, G1 and G2 are equivalent when G1 G2 and G2 G1 . Let G = (C, α, γ, A) be a GI, f : C → C be some concrete semantic function — for simplicity, we consider here 1-ary functions — and f  : A → A be a corresponding abstract function. A, f  is a sound abstract interpretation when α ◦ f f  ◦ α. The def abstract function f A = α ◦ f ◦ γ : A → A is called the best correct approximation of f in A. Completeness in abstract interpretation corresponds to require the following strengthening of soundness: α ◦ f = f  ◦ α. This is called backward completeness because an orthogonal notion of forward completeness may be considered: in fact, the Strong Preservation of Temporal Fixpoint-Based Operators by Abstract Interpretation 335 soundness condition α ◦ f f  ◦ α is equivalent to f ◦ γ γ ◦ f  , so that forward completeness for f  corresponds to strengthen soundness by requiring: f ◦ γ = γ ◦ f  . Giacobazzi et al. [12] observed that both backward and forward completeness uniquely depend upon the abstraction map, namely they are abstract domain properties. In fact, it turns out that there exists f  : A → A such that A, f  is backward (forward) complete iff γ ◦ α ◦ f ◦ γ ◦ α = γ ◦ α ◦ f (γ ◦ α ◦ f ◦ γ ◦ α = f ◦ γ ◦ α). Thus, we say that a GI G is backward (forward) complete for f when γ ◦ α ◦ f ◦ γ ◦ α = γ ◦ α ◦ f (γ ◦ α ◦ f ◦ γ ◦ α = f ◦ γ ◦ α). Note that G is forward complete for f iff f maps elements in img(γ) to elements in img(γ). If [[·]] : L → C and [[·]] : L → A are, respectively, a concrete and an abstract semantics of a generic language L, then soundness and completeness for the abstract semantics [[·]] are defined as follows: A, [[·]] is sound (respectively, backward complete, forward complete) if for any ϕ ∈ L, α([[ϕ]]) ≤A [[ϕ]] (respectively, α([[ϕ]]) = [[ϕ]] , [[ϕ]] = γ([[ϕ]] )). Recall that a GI G = (C, α, γ, A) is disjunctive (or additive) when γ is additive, i.e. when γ preserves arbitrary least upper bounds. It turns out that G is disjunctive iff img(γ) ⊆ C is join-closed, i.e. closed under arbitrary lub’s. Disjunctive GIs can be “inverted” as follows and such inversion preserves forward completeness. Proposition 2.1. Let G = (C≤ , α, γ, A≤ ) be a disjunctive GI and f : C → C. def def (i) Let α (c) = ∨ {a ∈ A | γ(a) ≤ c}. Then, G = (C≥ , α , γ, A≥ ) is a GI.  (ii) G is forward complete for f iff G is forward complete for f . In this case, the two best correct approximations of f w.r.t. G and G coincide. 3 Abstract Models 3.1 Abstract Semantics We consider (temporal) specification languages L whose state formulas ϕ are inductively defined by: L  ϕ ::= p | f (ϕ1 , ..., ϕn ), where p ∈ AP ranges over a set of atomic propositions while f ranges over a finite set Op of operators. AP and Op are also denoted, respectively, by APL and Op L . Each f ∈ Op has an arity ar(f ) > 0. The interpretation of formulas in L is determined by a semantic structure S = (Σ, I) where Σ is a set of states and I is an interpretation function which maps p ∈ AP to I(p) ∈ ℘(Σ) and f ∈ Op to I(f ) : ℘(Σ)ar(f ) → ℘(Σ). We also use p and def f to denote, respectively, I(p) and I(f ). Also, AP = {p ∈ ℘(Σ) | p ∈ AP} and def Op = {f : ℘(Σ)ar(f ) → ℘(Σ) | f ∈ Op}. The concrete state semantic function [[·]]S : L → ℘(Σ) evaluates a formula ϕ ∈ L to the set of states making ϕ true w.r.t. the semantic structure S: [[p]]S = p and [[f (ϕ1 , ..., ϕn )]]S = f ([[ϕ1 ]]S , ..., [[ϕn ]]S ). Semantic structures generalize the role of Kripke structures. In fact, in standard model checking [3], a semantic structure is usually defined through a Kripke structure K so that the interpretation of operators in Op is defined in terms of paths in K and of standard logical operators. In the following, we will freely use standard logical and temporal operators together with their corresponding usual interpretations: for example, I(∧) = ∩, I(¬) = , I(EX) = preR , etc. 336 F. Ranzato and F. Tapparo Following the abstract interpretation approach, an abstract semantic structure is given by S = (A, I  ) where (C, α, γ, A) is a GI and for any p ∈ AP and f ∈ Op, I(p) ∈ A and I  (f ) : Aar(f ) → A. Thus, an abstract semantic structure S defines an abstract semantics [[·]]S : L → A for the language L. Let S be a (concrete) semantic structure for L. A GI (C, α, γ, A) always induces an abstract semantic structure SA = (A, I A ) where I A provides the best correct approxdef imations on A of the concrete interpretation of constants/operators: I A (p) = α(I(p)) def A A for p ∈ AP and I (f ) = (I(f )) for f ∈ Op. If the (concrete) interpretation Op L consists of monotone functions then the abstract semantics [[·]]SA induced by SA is always automatically sound. This induced abstract semantics will be denoted by [[·]]A S. Example 3.1. Let us consider the following Kripke structure K, where superscripts denote the labelling function. Abstract domain A Kripke structure K 1p p q ;3 ;5 ww ww  www  www  x q q r 2 4 6  <  <<   qr  ;;;      p NN q r NNN ppp p p N pp ⊥ Let L  ϕ ::= p | ϕ1 ∧ ϕ2 | EXϕ. Let S be the semantic structure for L induced by the Kripke structure K so that EX = pre . Let A be the lattice depicted above. We consider the abstraction map α : ℘(Σ)⊆ → A where α({n}), i.e. on singletons, is defined def def def by α({1}) = α({3}) = p , α({2}) = α({4}) = α({5}) = q  and α({6}) = r , while def for any S ∈ ℘(Σ), α(S) = ∨s∈S α({s}). Hence, we have that: A A A A  A [[EXr]]A S = EX ([[r]]S ) = EX (α(r)) = EX (α({6})) = EX (r ) =  α(EX(γ(r ))) = α(EX({6})) = α({5, 6}) = α({5}) ∨ α({6}) = q  ∨ r = qr . Since γ(qr ) = {2, 4, 5, 6}, as expected, observe that the abstract semantics [[EXr]]A S is   a proper over-approximation in A of the concrete semantics [[EXr]]S = {5, 6}. 3.2 Partitioning Abstractions As shown in [17], standard partition-based abstract model checking [2, 3] can be viewed as a particular instance of abstract semantics as defined in Section 3.1, where: (i) given some state partition P ∈ Part(Σ), the abstract domain is ℘(P )⊆ , where the abstraction def map is the “covering” function αP : ℘(Σ)⊆ → ℘(P )⊆ such that αP (S) = {B ∈ P | B ∩ S = ∅}, while γP : ℘(P )⊆ → ℘(Σ)⊆ is given by γP (X) = ∪B∈X B; (ii) if the concrete interpretation function I is based on a concrete Krike structure K, then the abstract interpretation function I  is simply given by the evaluation of I on an abstract Kripke structure A = (P, R ,AP,  ) which replaces K, where R ⊆ P × P is the abstract transition relation on the abstract state space P . Thus, in this sense, an abstract Kripke structure always induces an abstract semantics for a language. Any GI G = (℘(Σ)⊆ , α, γ, A) which is equivalent to a GI (℘(Σ)⊆ , αP , γP , ℘(P )⊆ ), for some partition P ∈ Part(Σ), is called partitioning. It turns out (see [17]) that G Strong Preservation of Temporal Fixpoint-Based Operators by Abstract Interpretation 337 is partitioning iff γ(A) is closed under complementation. Of course, not every abstraction of ℘(Σ)⊆ is partitioning. For instance, if s̄ ∈ Σ, A = {⊥, }, γ(⊥) = {s̄} and γ() = Σ then (℘(Σ)⊆ , α, γ, A) is a disjunctive GI, where α denotes the left adjoint to γ, which is not partitioning because γ(A) = {{s̄}, Σ} is not closed under complementation. This opens the question whether it is possible to minimally refine a given abstract domain in order to make it partitioning. Given a GI G = (℘(Σ)⊆ , α, γ, A), we define an equivalence relation ∼G on Σ by identifying those states that are blurred by the abstraction α: s ∼G t iff α({s}) = α({t}). This is an equivalence relation, namely a partition in Part(Σ), and therefore it induces a partitioning abstraction that we denote by P(G). As shown in [17], it turns out that P(G) is the least partitioning refinement of G, that is: P(G) G and for any partitioning G G, G P(G). Example 3.2. Let us consider the abstraction G in Example 3.1. From the definition of α, we have that α({s}) = α({t}) iff s and t belong to the same block of the partition P = {[13], [245], [6]}, so that P(G) is given by the GI (℘(Σ), αP , γP , ℘(P )). The abstract domain ℘(P )⊆ can be therefore represented by the lattice depicted on the right.   t  JJJJ ttt pq  H pr  H qr  HHv HHv vvv H  vvv H  p JJ q r JJJ tt t t t ⊥ 3.3 Strong Preservation As recalled above, standard abstract model checking [2, 3] is based on state partitions and abstract Kripke structures. Strong preservation for some language L encodes the equivalence of abstract and concrete validity for formulas in L. Given a partition P ∈ Part(Σ), let [[·]]P : L → ℘(P ) denote an abstract semantics defined on ℘(P ). For example, but not necessarily, this can be the abstract semantics induced by an abstract Kripke structure (P, R , AP,  ). A partition P ∈ Part(Σ) is strongly preserving (s.p. for short) for L when for any s ∈ Σ and ϕ ∈ L, s ∈ [[ϕ]] iff αP ({s}) ∈ [[ϕ]]P . It is known [8, 9, 17] that the coarsest s.p. partition PL for L is given by the following state equivalence ∼L induced by L: s1 ∼L s2 iff ∀ϕ ∈ L. s1 ∈ [[ϕ]] ⇔ s2 ∈ [[ϕ]]. Obviously, the definition of an abstract Kripke structure which induces a s.p. abstract semantics depends on the language L. Let us recall some well-known examples [2, 3, 13]. Let K = (Σ, R, AP, ) be a concrete Kripke structure and let Psim , Pbis ∈ Part(Σ) denote, respectively, simulation and bisimulation equivalence on K. Then, the abstract semantics induced by the abstract Kripke structure (Psim , R∀∃ , AP,  ) (where  (B) = ∪s∈B (s)) is s.p. for ACTL∗ , while that induced by (Pbis , R∃∃, AP ,  ) is s.p. for CTL∗ . Strong preservation was generalized in [17] to abstract domains as follows. Definition 3.3. Let S = (Σ, I) and S = (A, I  ) be, respectively, concrete and abstract semantic structures for L. Let [[·]]S : L → A be the corresponding abstract semantics. S (or [[·]]S ) is strongly preserving for L (w.r.t. S) when for any S ∈ ℘(Σ) and ϕ ∈ L, S ⊆ [[ϕ]]S ⇔ α(S) ≤A [[ϕ]]S .   The following simple but key result shows that strong preservation amounts to forward completeness. Theorem 3.4. S is s.p. for L iff the abstract semantics A, [[·]]S is forward complete. 338 F. Ranzato and F. Tapparo It turns out (cf. [17]) that if a s.p. abstract semantics on the abstract domain A exists then the abstract semantics [[·]]A S induced by A is s.p. as well, so that strong preservation is an abstract domain property. Hence, we say that the GI G = (℘(Σ)⊆ , α, γ, A) (or simply A when the GI is clear from the context) is s.p. for L if SA is s.p. for L (or, equivalently, if a s.p. abstract semantics on A exists). In this case, by Theorem 3.4, we also say that the abstract domain A is language forward complete for L. Example 3.5. Let us consider again Example 3.1. It turns out that A is not s.p. pre serving for L because γ([[EXr]]A S ) = γ(qr ) = {2, 4, 5, 6}, while [[EXr]]S = {5, 6}. A Therefore, for instance, 2 ∈ γ([[EXp]]S )[[EXr]]S , or, equivalently, α({2}) ≤ [[EXr]]A S whilst 2 ∈ [[EXr]]S .   4 Abstract Semantics It is known (see e.g. [7]) that if an abstract domain A is forward complete for all the constants/operators of AP ∪ Op (where atomic propositions are viewed as 0-ary operators) — here also called operator-wise forward completeness — of some concrete interpretation of some language L then A is language forward complete for L (i.e., for all ϕ ∈ L, [[ϕ]]S = γ([[ϕ]]A S )). The converse in general is not true, as shown by the following example. Example 4.1. Let us consider the following Kripke structure K and the partitioning abstract domain A induced by the partition P = {[12], [3]}, i.e. A = ℘(P )⊆ . 1p / 2p / 3p g Let us consider the language L  ϕ ::= p | EXϕ. The Kripke structure K induces the semantic structure S = ({1, 2, 3}, I) such that I(p) = {1, 2, 3} and I(EX) = pre . Hence, we have that [[p]]S = {1, 2, 3}, [[EXp]]S = {1, 2, 3} and, for k > 1, [[EXk p]]S = A {1, 2, 3}. On the abstract side we have that [[p]]A S = {[12], [3]}, [[EXp]]S = {[12], [3]} k A and, for k > 1, [[EX p]]S = {[12], [3]}. Thus, for any ϕ ∈ L, [[ϕ]]S = γP ([[ϕ]]A S ), i.e. the abstract domain A is language forward complete for L. On the other hand, pre (γP (αP ({3}))) = pre ({3}) = {2, 3} while γP (αP (pre (γP (αP ({3}))))) = γP (αP ({2, 3})) = {1, 2, 3}, so that A is not forward complete for pre .   Operator-wise forward completeness is easier to check than language forward completeness. Moreover, the problem of refining an abstract domain in order to make it forward (or backward) complete for a given set of operators admits constructive fixpoint solutions [12, 18]. It is thus interesting to determine conditions on abstract domains which guarantee the equivalence of operator-wise and language forward completeness. Definition 4.2. Let S = (Σ, I) be a semantic structure for L and (℘(Σ)⊆ , α, γ, A) be a GI. The abstract domain A is L-covered by the concrete semantics [[·]]S (or simply by   S) when for any a ∈ A there exists ϕ ∈ L such that γ(a) = [[ϕ]]S . It turns out that this notion of covering ensures the equivalence of operator-wise and language forward completeness. Strong Preservation of Temporal Fixpoint-Based Operators by Abstract Interpretation 339 Theorem 4.3. Let A be L-covered by S. Then, A is language forward complete for L iff A is forward complete for all the constants/operators inAPL ∪ OpL . As recalled above, given an abstract domain A, if an abstract semantic structure S = (A, I  ) is s.p. for L then the abstract structure SA = (A, I A ) induced by A is s.p. for L as well. However, the interpretation functions I  and I A may differ. Example 4.4. Let us consider again Example 4.1. Let us first note that A is not Lcovered by S because {[[ϕ]]S | ϕ ∈ L} = {{1, 2, 3}}. Let us consider the abstract semantic structure S = (A, I  ) induced by the following abstract Kripke structure: t / [3]p [12]p Hence, I  (EX) = preR where preR (∅) = ∅, preR ({[12]}) = ∅, preR ({[3]}) = {[12], [3]}, preR ({[12], [3]}) = {[12], [3]}. It is easy to see that S is s.p. for L. In fact, we have that γP ([[p]]S ) = γP ({[12], [3]}) = {1, 2, 3} = [[p]]S and γP ([[EXp]]S ) = γP (preR ({[12], [3]})) = γP ({[12], [3]}) = {1, 2, 3} = [[EXp]]S , so that by Theorem 3.4, S is s.p. for L. However, it turns out that I  (EX) = I A (EX) = αP ◦ pre ◦γP . In fact, preR ({[12]}) = ∅ while αP (pre (γP ({[12]}))) = αP (pre ({1, 2})) = αP ({1}) = {[12]}. Thus, S and SA are two different abstract semantic structures which are both s.p. for L.   Thus, in general, for a given abstract domain A, there may be different s.p. abstract semantic structures defined over A. However, if A is L-covered by the concrete semantic structure then a unique s.p. abstract semantic structure may exist. Corollary 4.5. If A is L-covered by S and S = (A, I  ) is s.p. for L then I  = I A . Thus, when A is L-covered by S, we have that a unique interpretation of constants/ functions on A which is s.p. for L may exist, namely their best correct approximations on A. Example 4.6. Let us consider the language L  ϕ ::= p | ϕ1 ∧ ϕ2 | ¬ϕ | EXϕ and the following Kripke structure K with transition relation R. t [2]q 7 o O o x ooo Concrete Abstract p p q / [13] 1 2 LLL rr9 O Kripke structure K OOO Kripke structure A LrLrL OO' r L r r %/ r p [4]r 3 4 This induces a concrete semantic structure S = ({1, 2, 3, 4}, I) where I(p) = {1, 3}, I(q) = {2}, I(r) = {4}, I(¬) = , I(∧) = ∩ and I(EX) = preR . Let us consider the state partition P = {13, 2, 4} and the corresponding abstract Kripke structure A depicted above where the transition relation is given by R∃∃ . Let us consider the abstract semantic structure S = (A, I  ) induced by A, i.e. A = ℘(P )⊆ and I  (p) = {13}, I  (q) = {2}, I  (r) = {4}, I  (¬) = , I  (∧) = ∩ and I  (EX) = preR∃∃ . It is easy to check that I  (¬), I  (∧) and I  (EX) are indeed the best correct approximations on A of, respectively, the concrete operations of set complementation  = I(¬), set interesection ∩ = I(∧) and preR = I(EX). Hence, I  = I A , namely S = SA . 340 F. Ranzato and F. Tapparo It turns out that A is L-covered by S. In fact, since the set of concrete semantics of formulas in L is closed under set complementation we have that any union of blocks of P belongs to {[[ϕ]]S | ϕ ∈ L}, so that img(γP ) ⊆ {[[ϕ]]S | ϕ ∈ L}. We also have that SA is s.p. for L. This happens because A is forward complete for the constants/operations of L. In fact, all the concrete operations , ∩ and preR map unions of blocks in ℘(P ) into unions of blocks in ℘(P ) and therefore the abstract domain A = ℘(P ) is forward complete for them. For example, let us observe that this holds for preR because preR ({1, 3}) = ∅, preR ({2}) = {1, 3, 4} and preR ({4}) = {1, 3}. Hence, since A is operator-wise forward complete we have that A is language forward complete for L as well and therefore, by Theorem 3.4, SA is s.p. for L. Consequently, by Corollary 4.5, SA is the unique abstract semantic structure on the abstract domain A which is s.p. for L.   It may also happen that one can define a s.p. abstract semantics on some partition P although this abstract semantics cannot be derived from an abstract Kripke structure on P , as shown by the following example. Example 4.7. Consider the following simple language L  ϕ ::= p | AXXϕ and the following Kripke structure K where R is the transition relation. w GFED @ABC r go stop stop GFED / @ABC ry go GFED / @ABC g GFED / @ABC y This models a four-state traffic light controller (like in the U.K. and in Germany). This gives rise to a concrete semantic structure S = ({r, ry, g, y}, I) where I(stop) = {r, ry}, I(go) = {g, y} and I(AXX) = pre  R2 . Hence, according to the standard interpretation I(AXX) = pre  R2 , we have that s ∈ [[AXXϕ]]S iff for any path π0 π1 π2 . . . in K starting from s = π0 , we have that π2 ∈ [[ϕ]]S . Observe that [[AXXstop]]S = {g, y} and [[AXXgo]]S = {r, ry}. Consider the partition P = {[r, ry], [g, y]} and the corresponding partitioning abstract domain A = ℘(P )⊆ . Hence, for the corresponding abstract semantic structure SA = (A, I A ) we have that I A (stop) = {[r, ry]}, I A (go) =  R2 ◦ γP , so that {[g, y]} and I A (AXX) = αP ◦ pre I A (AXX)(∅) = ∅; I A (AXX)({[r, ry]}) = {[g, y]}; I A (AXX)({[g, y]}) = {[r, ry]}; I A (AXX)({[r, ry], [g, y]}) = {[r, ry], [g, y]}. By Theorem 3.4, it turns out that SA is s.p. for L because A is forward complete for pre  R2 . In fact, it turns out that pre  R2 maps unions of blocks in P to unions of blocks in P because: pre  R2 (∅) = ∅, pre  R2 ({r, ry}) = {g, y}, pre  R2 ({g, y}) = {r, ry} and pre  R2 ({r, ry, g, y}) = {r, ry, g, y}. However, let us show that there exists no abstract transition relation R ⊆ P × P on the partition P such that the abstract Kripke structure A = (P, R ,AP,  ) induces an abstract semantic structure which is s.p. for L. Assume by contradiction that such an abstract Kripke structure A exists and let S be the corresponding induced abstract semantic structure. Let B1 = [r, ry] ∈ P and B2 = [g, y] ∈ P . Since r ∈ [[AXXgo]]S and g ∈ [[AXXstop]]S , by strong preservation, it must be that B1 ∈ [[AXXgo]]S and B2 ∈ [[AXXstop]]S . Thus, necessarily, (B1 , B2 ), (B2 , B1 ) ∈ R . This leads to the Strong Preservation of Temporal Fixpoint-Based Operators by Abstract Interpretation 341 contradiction B1 ∈ [[AXXgo]]S . In fact, if R = {(B1 , B2 ), (B2 , B1 )} then we would have that B1 ∈ [[AXXgo]]S . Moreover, if, instead, (B1 , B1 ) ∈ R (the case (B2 , B2 ) is analogous), then we would still have that B1 ∈ [[AXXgo]]S . Even more, along the same lines it is not difficult to show that no proper abstract Kripke structure induces an abstract semantic structure which strongly preserves L, because even if we split one of the two blocks B1 or B2 we still cannot define an abstract transition relation ensuring strong preservation for L.   5 Fixpoints in Abstract Semantics The above abstract interpretation-based approach to abstract model checking systematically defines the abstract semantics by approximating the interpretation of logical/temporal operators through their best correct approximations on the abstract domain. In principle, this can be done for any logical/temporal operator. However, when a temporal operator f can be expressed as a least/greatest fixpoint of another temporal operator g, e.g. f (S) = lfp(λX.g(X, S)), the best correct approximation α ◦ f ◦ γ might not be characterizable as a least/greatest fixpoint. Ideally, we would aim at approximating g through some abstract operator g  in order to be able to characterize α ◦ f ◦ γ as the abstract least fixpoint of g  . Let us illustrate this through the case of the “Finally” operator EF, whose standard interpretation can be characterized as a fixpoint: EF(S) = lfp(λX.S ∪ EX(X)). The best correct approximation of EF w.r.t. a Galois insertion (℘(Σ)⊆ , α, γ, A) is the abstract function α ◦ EF ◦ γ : A → A. However, this definition gives us no clue for computing α ◦ EF ◦ γ as a least fixpoint. By contrast, in standard abstract model checking the abstract interpretation of the language operators is based on an abstract transition relation defined on the abstract state space, i.e. an abstract Kripke structure, so that it is enough to compute the least fixpoint lfp(λX.S ∪ EX(X)) on the abstract Kripke structure. For example, consider the language L  ϕ ::= p | ϕ1 ∧ ϕ2 | ¬ϕ | EFϕ. Let K = (Σ, R,AP, ) be a concrete Kripke structure. One can easily see that if P ∈ Part(Σ) is s.p. for L then the abstract Kripke structure on P with abstract transition relation R∃∃ ⊆ P × P is s.p. for L. In this case, while the concrete fixpoint is given by EF(S) = lfp(λX.S ∪ preR (X)), for any S ⊆ Σ, the abstract fixpoint is lfp(λX  .S  ∪P preR∃∃ (X  )), for any S  ⊆ P , where ∪P is union of blocks in P , namely the least upper bound of the abstract domain ℘(P )⊆ . Recall that the abstract domain ℘(P )⊆ is related to the concrete domain ℘(Σ)⊆ by the GI GP = (℘(Σ)⊆ , αP , γP , ℘(P )⊆ ). The key point to note here is that  λ X  , Y  . X  ∪A preA R (Y ) is indeed the best correct approximation of the concrete operation λ X, Y . X ∪ preR (Y ) through the GI GP . These observations lead us to the following generalization. Theorem 5.1. Let C be a complete lattice, (C, α, γ, A) be a GI and f : C n+1 → C be def monotone. Let F = λc ∈ C n . lfp(λx.f (c1 , ..., x, ..., cn )). If A is forward complete for A F then F = λa ∈ An .lfp(λx.f A (a1 , ..., x, ..., an )). Let us remark that the above result can be also stated by duality for greatest fixpoints as def follows: if (C≥ , α, γ, A≥ ) is a GI, F = λc ∈ C n . gfp(λx.f (c1 , ..., x, ..., cn )) and A is forward complete for F then F A = λa ∈ An .gfp(λx.f A (a1 , ..., x, ..., an )). 342 F. Ranzato and F. Tapparo By Theorems 3.4 and 4.3, given a language L and a semantic structure S for L, if A is L-covered by S then A is forward complete for the constants/operators inAPL ∪ OpL iff SA is s.p. for L. Thus, in this case, by Theorem 5.1, if SA is s.p. for L and Op L includes an operator f which can be expressed as a least/greatest fixpoint of some operation g then the best correct approximation of f on A can be obtained as the abstract least/greatest fixpoint of the best correct approximation of g on A. Example 5.2. Let us consider L  ϕ ::= p | ϕ1 ∧ ϕ2 | ¬ϕ | EFϕ and the following Kripke structure K with transition relation R which induces a concrete semantic x structure S. / 2q / 3p / 4q / 5q / 6r 1p Let us consider the partition P = {[1], [2], [3], [45], [6]} and the corresponding abstract Kripke structure A depicted below where the transition relation is given by R∃∃ . [1]p / [2]q / [3]p  / [45]q  / [6]r Let SA be the abstract semantic structure induced by the abstract domain A = ℘(P )⊆ . It turns out that SA is s.p. for L because A is forward complete for (AP L and) OpL = {∩, , EF}: in fact, it is easy to check that A is forward complete for EF because EF maps unions of blocks in P to unions of blocks in P . Since A is forward complete for def EF and EF(S) = lfp(λX.f (S, X)), where f (S, X) = S ∪ preR (X), by Theorem 5.1 we have that EFA = λS  .lfp(λX  .f A (S  , X  )) : ℘(P ) → ℘(P ). Moreover, as discussed above, f A (S  , X  ) = αP (f (γP (S  ), γP (X  ))) = S  ∪P preR∃∃ (X  ) so that EFA = λS  . lfp(λX  .S  ∪ preR∃∃ (X  )), namely the best correct approximation EFA can be computed as the least fixpoint characterization of the “finally” operator on the above abstract Kripke structure A.   6 Applications We are mainly interested in applying Theorem 5.1 to standard fixpoint-based operators of well known temporal languages (cf. [3]), as recalled in Table 1. Table 1. Temporal operators in fixpoint form “Finally” “Globally” “(Strong) Until” “Weak Until” “(Weak) Release” “Strong Release” AF(S) = lfp(λX.S ∪ AX(X)) EF(S) = lfp(λX.S ∪ EX(X)) AG(S) = gfp(λX.S ∩ AX(X)) EG(S) = gfp(λX.S ∩ EX(X)) AU(S, T ) = lfp(λX.T ∪ (S ∩ AX(X))) EU(S, T ) = lfp(λX.T ∪ (S ∩ EX(X))) AUw (S, T ) = gfp(λX.T ∪ (S ∩ AX(X))) EUw (S, T ) = gfp(λX.T ∪ (S ∩ EX(X))) AR(S, T ) = gfp(λX.T ∩ (S ∪ AX(X))) ER(S, T ) = gfp(λX.T ∩ (S ∪ EX(X))) ARs (S, T ) = lfp(λX.T ∩ (S ∪ AX(X))) ERs (S, T ) = lfp(λX.T ∩ (S ∪ EX(X))) Strong Preservation of Temporal Fixpoint-Based Operators by Abstract Interpretation 343 6.1 Partitioning Abstractions Let P ∈ Part(Σ) be any partition and let G = (℘(Σ)⊆ , αP , γP , ℘(P )⊆ ) be the corresponding partitioning GI. By Proposition 2.1 (i), G = (℘(Σ)⊇ , α P , γP , ℘(P )⊇ ) is a GI where α (S) = {B ∈ P | B ⊆ S}. Hence, while G over-approximates a set P S by the set of blocks in P which have a nonempty intersection with S, G underapproximates S by the set of blocks in P which are contained in S. Thus, we can apply Theorem 5.1 to G for least fixpoints and to G for greatest fixpoints. Since G is disjunctive, Let us note that by Proposition 2.1 (ii), G is forward complete for some function F iff G is forward complete for F . Hence, the hypotheses of Theorem 5.1 for least and greatest fixpoints actually coincide. Furthermore, in this case, the best correct approximations of F w.r.t. G and G coincide. In order to distinguish which GI has been applied, we use f A to denote the best correct approximation of some concrete function f w.r.t. G while f A denotes the best correct approximation of f w.r.t. G . For the standard temporal fixpoint-based operators in Table 1, the following result shows that their best correct approximations on a s.p. partitioning abstract domain preserve their characterizations as least/greatest fixpoints. Corollary 6.1. Let P ∈ Part(Σ) and G = (℘(Σ)⊆ , αP , γP , A = ℘(P )⊆ ) be the corresponding partitioning GI. Assume that G is forward complete for some fixpointbased operator F in Table 1. Then, the corresponding best correct approximations of F w.r.t. G are as follows: A  R (X  )) AFA (S  ) = lfp(λX  .S  ∪P pre A  EF (S  ) = lfp(λX  .S  ∪P preA R (X ))   A AGA (S  ) = gfp(λX  .S  ∩P pre R (X )) A  EG (S  ) = gfp(λX  .S  ∩P preA R (X ))  A AUA (S  , T  ) = lfp(λX  .T  ∪P (S  ∩P pre R (X ))) A  EU (S  , T  ) = lfp(λX  .T  ∪P (S  ∩P preA R (X )))        A AUA R (X ))) w (S , T ) = gfp(λX .T ∪P (S ∩P pre A  EUw (S  , T  ) = gfp(λX  .T  ∪P (S  ∩P preA R (X )))   A ARA (S  , T  ) = gfp(λX  .T  ∩P (S  ∪P pre R (X ))) A  ER (S  , T  ) = gfp(λX  .T  ∩P (S  ∪P preA R (X )))       A ARA R (X ))) s (S , T ) = lfp(λX .T ∩P (S ∪P pre A  ERs (S  , T  ) = lfp(λX  .T  ∩P (S  ∪P preA R (X ))) A It turns out that the best correct approximations preA  R can be characterized R and pre through the abstract transition relation R∃∃ ⊆ P × P as follows. Lemma 6.2. preA  A  R∃∃ . R = pre R = preR∃∃ and pre Let Op ⊆ {EX, AX, EF, AG, EU, AUw , AR, ERs } be a set of temporal fixpoint-based operators and let L  ϕ ::= p | ϕ1 ∧ ϕ2 | ¬ϕ | f (ϕ1 , . . . , ϕar(f ) ), where f ∈ Op, be the corresponding language. Let K = (Σ, R,AP, ) be a concrete Kripke structure and S be the concrete semantic structure for L induced by K. Consider now a partition P ∈ Part(Σ) and the corresponding abstract semantic structure SP = (℘(P ), I P ). 344 F. Ranzato and F. Tapparo Assume that SP is s.p. for L. As a consequence of the above results, it turns out that one can define an abstract Kripke structure on P whose abstract transition relation is R∃∃ which induces precisely SP . Corollary 6.3. If SP is s.p. for L then SP is induced by the abstract Kripke structure def AP = (P, R∃∃ ,AP, P ), where P = λB ∈ P.{p ∈ AP | B ∈ I P (p)}. Thus, a strongly preserving abstract model checking of the language L can be performed on the abstract Kripke structure AP . Example 6.4. Let us consider L  ϕ ::= p | ϕ1 ∧ ϕ2 | ¬ϕ | AGϕ and the following Kripke structure K and let S be the concrete semantic structure for L induced by K. t [2]q 7 oo O x ooo Concrete Abstract p p q / 2 Kripke structure K 1 LLL Kripke structure A P [13] OOO LLL O OO' L% p r / [4]r 3 4 Let us consider the partition P = {[13], [2], [4]} and the corresponding abstract semantic structure SP = (℘(P ), I P ). It turns out that SP is s.p. for L. This is a consequence of the fact that the abstract domain ℘(P ) is operator-wise forward complete for L hence ℘(P ) is language forward complete for L and in turn, by Theorem 3.4, SP is s.p. for L. In fact, the following equalities show that ℘(P ) is forward complete for AG, because AG maps unions of blocks in P to unions of blocks in P : AG(∅) = AG({4}) = AG({1, 3}) = AG({1, 3, 4}) = ∅; AG({2}) = AG({1, 2, 3}) = {2}; AG({2, 4}) = {2, 4}; AG({1, 2, 3, 4}) = {1, 2, 3, 4}. Thus, by Corollary 6.3, it turns out that SP is induced by the abstract Kripke structure AP = (P, R∃∃ ,APP , P ) which is depicted above. Let us notice that P is not a bisimulation on K because the states 1 and 3 belong to the same block [13] and 12 while 3  2. Thus, strong preservation of L on the abstract Kripke structure AP , with abstract transition relation R∃∃ , cannot be obtained as a consequence of standard strong preservation results [2, 3, 13].   Example 6.5. Let us consider L  ϕ ::= p | ϕ ∧ ϕ2 | ¬ϕ | EGϕ and the following Kripke structure K and let S be the concrete semantic structure for L induced by K. K: 1p / 2p / 3q  A ∃∃ :  [12]p  / [3]q A ∀∃ : [12]p  [3]q In this case, EG is not included among the operators of Corollary 6.3. Let us consider the partition P = {[13], [2], [4]}, the abstract domain A = ℘(P ) and the corresponding abstract semantic structure SA = (A, I A ). It turns out that SA is s.p. for L. As in Example 6.4, by Theorem 3.4, this derives from the following equalities which show Strong Preservation of Temporal Fixpoint-Based Operators by Abstract Interpretation 345 that A is forward complete for EG, because EG maps unions of blocks in P to unions of blocks in P : EG(∅) = EG({1, 2}) = ∅; EG({3}) = {3}; EG({1, 2, 3}) = {1, 2, 3}. Let us point out here that both the abstract Kripke structures A∃∃ and A∀∃ on P depicted above, whose abstract transition relations are, respectively, R∃∃ and R∀∃ , are not s.p. for L. This is shown by the following counterexamples: [1, 2] |=A∃∃ EGp while 1 |=K EGp; [1, 2] |=A∀∃ EG(p ∨ q) while 1 |=K EG(p ∨ q). On the other hand, we can exploit Corollary 6.1 so that EGA (S  ) = gfp(λX  .S  ∩P  A  preA R (X )) , where preR = αP ◦ preR ◦γP . For instance, we have that    preA R ({[3]}) = αP (preR (γP ({[3]}))) = αP (preR ({3})) = αP ({2, 3}) = {[3]}. A A Likewise, preA R (∅) = preR ({[12]}) = ∅ and preR ({[12], [3]}) = {[12], [3]}. As A     an example, we have that EG ({[3]}) = gfp(λX .{[3]} ∩ preA R (X )) = {[3]}. 6.2 Disjunctive Abstractions In model checking, disjunctive abstract domains have been implicitely used by Henzinger et al.’s [14] algorithm for computing simulation equivalence: in fact, this algorithm maintains, for any state s ∈ Σ, a set of states sim(s) ⊆ Σ which represents exactly a disjunctive abstract domain. As observed in Section 3.2, any partitioning abstract domain is disjunctive while the converse is not true. Let G = (℘(Σ)⊆ , α, γ, A) be a disjunctive GI. By Proposition 2.1 (i), G = (℘(Σ)⊇ , α , γ, A≥ ) is a GI where α (S) = ∨{a ∈ A | γ(a) ⊆ S}. Thus, we can apply Theorem 5.1 to G for least fixpoints and Theorem 5.1 to G for greatest fixpoints. Also, as already observed in Section 6.1, the hypotheses of Theorem 5.1 for least and greatest fixpoints coincide and, in this case, the best correct approximations of some concrete function w.r.t. G and G coincide. We use f A to denote the best correct approximation of some concrete function f w.r.t. G while f A denotes the best correct approximation of f w.r.t. G . Here, we can generalize Corollary 6.1 to disjunctive abstract domains for the case of “finally” and “globally” operators. Corollary 6.6. Let G = (℘(Σ)⊆ , α, γ, A) be a disjunctive GI .Assume that G is forward complete for some operator F ∈ {AF, EF, AG, EG}. Then, the corresponding best correct approximations of F w.r.t. G are as follows: A  EFA (S  ) = lfp(λX  .S  ∪ preA R (X )); A  EGA (S  ) = gfp(λX  .S  ∩ preA R (X )).  R (X  )); AFA (S  ) = lfp(λX  .S  ∪ pre AGA (S  ) = gfp(λX  .S  ∩ pre  R (X  )); Example 6.7. Let us consider the concrete Kripke structure K of Example 5.2 and the def language L  ϕ ::= p | ϕ1 ∧ ϕ2 | ϕ1 ∨ ϕ2 | EFϕ. Let Atoms = {[1], [2], [3], [6], [245]} and let A be the closure under arbitrary unions of Atoms. Let (℘(Σ)⊆ , α, id, A⊆ ) be the corresponding disjunctive GI where α on singletons in ℘(Σ) is as follows: α({1}) = [1]; α({2}) = [2]; α({3}) = [3]; α({4}) = [245]; α({5}) = [245]; α({6}) = [6]. 346 F. Ranzato and F. Tapparo It turns out that A is forward complete for EF because EF maps atoms to unions of atoms and EF is additive: EF({1}) = {1}; EF({2}) = {1, 2}; EF({3}) = {1, 2, 3}; EF({6}) = {1, 2, 3, 4, 5, 6}; EF({2, 4, 5}) = {1, 2, 3, 4, 5}.  Thus, we can apply Corollary 6.6 so that EFA (S  ) = lfp(λX  .S  ∪ preA R (X )), where A A preR = α ◦ preR ◦id. For instance, preR on the atom [245] is as follows: preA R ([245]) = α(preR ({2, 4, 5})) = α({1, 3, 4}) = [12345]. Likewise, we have that preA R on Atoms is as follows: A A A preA R ([1]) = ∅; preR ([2]) = [1]; preR ([3]) = [2]; preR ([6]) = [2456].  As an example, EFA ([6]) = lfp(λX  .[6] ∪ preA R (X )) is computed as follows: X0 = ∅; X1 = [6] ∪ preA R (∅) = [6] ∪ ∅ = [6]; X2 = [6] ∪ preA R ([6]) = [6] ∪ [2456] = [2456]; X3 = [6] ∪ preA R ([2456]) = [6] ∪ [123456] = [123456] (fixpoint) How to obtain an abstract Kripke structure which is s.p. for L? This can be obtained from the coarsest s.p. partition PL for L (cf. Section 3.3). As a consequence of results in [17], it turns out that PL = {[1], [2], [3], [6], [45]} because ℘(PL ) is exactly the least partitioning refinement of A (cf. Section 3.2). One can define a s.p. abstract Kripke structure A on PL by considering R∃∃ as abstract transition relation: [1]p / [2]q / [3]p  / [45]q  / [6]r For the abstract Kripke structure A, EF ([6]) = lfp(λX  .{[6]} ∪ preR∃∃ (X  )) is computed as follows: X0 = ∅; X1 = {[6]} ∪ preR∃∃ (∅) = {[6]}; X2 = {[6]} ∪ preR∃∃ ({[6]}) = {[6]} ∪ {[6], [45]} = {[6], [45]}; X3 = {[6]} ∪ preR∃∃ ({[6], [45]}) = {[6]} ∪ {[6], [45], [3]} = {[6], [45], [3]}; X4 = {[6]} ∪ preR∃∃ ({[6], [45], [3]}) = {[6]} ∪ {[6], [45], [3], [2]} = {[6], [45], [3], [2]}; X5 = {[6]} ∪ preR∃∃ ({[6], [45], [3], [2]}) = {[6]} ∪ {[6], [45], [3], [2], [1]} = {[6], [45], [3], [2], [1]} (fixpoint) The point to observe here is that this standard approach needs a greater number of iterations than our abstract interpretation-based approach to reach the fixpoint.   Strong Preservation of Temporal Fixpoint-Based Operators by Abstract Interpretation 347 Acknowledgements. This work was partially supported by the FIRB Project “Abstract interpretation and model checking for the verification of embedded systems” and by the COFIN2004 Project “AIDA: Abstract Interpretation Design and Applications”. References 1. E.M. Clarke, O. Grumberg, S. Jha, Y. Lu, H. Veith. Progress on the state explosion problem in model checking. In Informatics - 10 Years Back, 10 Years Ahead. LNCS 2000, pp. 176-194, 2001. 2. E.M. Clarke, O. Grumberg and D. Long. Model checking and abstraction. ACM Trans. Program. Lang. Syst., 16(5):1512-1542, 1994. 3. E.M. Clarke, O. Grumberg and D.A. Peled. Model Checking. The MIT Press, 1999. 4. P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proc. 4th ACM POPL, 238-252, 1977. 5. P. Cousot and R. Cousot. Systematic design of program analysis frameworks. In Proc. 6th ACM POPL, 269-282, 1979. 6. P. Cousot and R. Cousot. Refining model checking by abstract interpretation. Automated Software Engineering Journal, 6(1):69-95, 1999. 7. P. Cousot and R. Cousot. Temporal abstract interpretation. In Proc. 27th ACM POPL, pp. 1225, 2000. 8. D. Dams. Abstract Interpretation and Partition Refinement for Model Checking. PhD Thesis, Eindhoven Univ., 1996. 9. D. Dams. Flat fragments of CTL and CTL∗ : separating the expressive and distinguishing powers. Logic J. of the IGPL, 7(1):55-78, 1999. 10. D. Dams, O. Grumberg and R. Gerth. Abstract interpretation of reactive systems. ACM Trans. Program. Lang. Syst., 16(5):1512-1542, 1997. 11. R. Giacobazzi and E. Quintarelli. Incompleteness, counterexamples and refinements in abstract model checking. In Proc. 8th SAS, LNCS 2126, pp. 356-373, 2001. 12. R. Giacobazzi, F. Ranzato and F. Scozzari. Making abstract interpretations complete. J. ACM, 47(2):361-416, 2000. 13. O. Grumberg and D.E. Long. Model checking and modular verification. ACM Trans. Program. Lang. Syst., 16(3):843-871, 1994. 14. M.R. Henzinger, T.A. Henzinger and P.W. Kopke. Computing simulations on finite and infinite graphs. In Proc. 36th FOCS, pp. 453-462, 1995. 15. C. Loiseaux, S. Graf, J. Sifakis, A. Bouajjani and S. Bensalem. Property preserving abstractions for the verification of concurrent systems. Formal Methods in System Design, 6:1-36, 1995. 16. D. Massé. Semantics for abstract interpretation-based static analyzes of temporal properties. In Proc. 9th SAS, LNCS 2477, pp. 428-443, 2002. 17. F. Ranzato and F. Tapparo. Strong preservation as completeness in abstract interpretation. In Proc. 13th ESOP, LNCS 2986, pp. 18-32, 2004. 18. F. Ranzato and F. Tapparo. An abstract interpretation-based refinement algorithm for strong preservation. In Proc. 11th TACAS, LNCS 3440, pp. 140-156, 2005. 19. D.A. Schmidt. Closed and logical relations for over- and under-approximation of powersets. In Proc. 11th SAS, LNCS 3148, pp. 22-37, 2004. Symbolic Methods to Enhance the Precision of Numerical Abstract Domains Antoine Miné École Normale Supérieure, Paris, France mine@di.ens.fr http://www.di.ens.fr/∼mine Abstract. We present lightweight and generic symbolic methods to improve the precision of numerical static analyses based on Abstract Interpretation. The main idea is to simplify numerical expressions before they are fed to abstract transfer functions. An important novelty is that these simplifications are performed on-the-fly, using information gathered dynamically by the analyzer. A first method, called “linearization,” allows abstracting arbitrary expressions into affine forms with interval coefficients while simplifying them. A second method, called “symbolic constant propagation,” enhances the simplification feature of the linearization by propagating assigned expressions in a symbolic way. Combined together, these methods increase the relationality level of numerical abstract domains and make them more robust against program transformations. We show how they can be integrated within the classical interval, octagon and polyhedron domains. These methods have been incorporated within the Astrée static analyzer that checks for the absence of run-time errors in embedded critical avionics software. We present an experimental proof of their usefulness. 1 Introduction Ensuring the correctness of software is a difficult but important task, especially in embedded critical applications such as planes or rockets. There is currently a great need for static analyzers able to provide invariants automatically and directly on the source code. As the strongest invariants are not computable in general, such tools need to perform sound approximations at the expense of completeness. In this article, we will only consider the properties of numerical variables and work in the Abstract Interpretation framework. A static analyzer is thus parameterized by a numerical abstract domain, that is, a set of computerrepresentable numerical properties together with algorithms to compute the semantics of program instructions. There already exist quit a few numerical abstract domains. Well-known examples include the interval domain [5] that discovers variable bounds, and the polyhedron domain [8] for affine inequalities. Each domain achieves some cost  This work was partially supported by the Astrée RNTL project and the APRON project from the ACI “Sécurité & Informatique.” E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 348–363, 2006. c Springer-Verlag Berlin Heidelberg 2006  Symbolic Methods to Enhance the Precision of Numerical Abstract Domains 349 X ← [−10, 20]; Y ← X; if (Y ≤ 0) { Y ← −X; } // here, Y ∈ [0, 20] Fig. 1. Absolute value computation example X ← [0, 1]; Y ← [0, 0.1]; Z ← [0, 0.2]; T ← (X × Y ) − (X × Z) + Z; // here, T ∈ [0, 0.2] Fig. 2. Linear interpolation computation example versus precision balance. In particular, non-relational domains—e.g., the interval domain—are much faster but also much less precise than relational domains— able to discover variable relationships. Although the interval information seem sufficient—it allows expressing most correctness requirements, such as the absence of arithmetic overflows or out-of-bound array accesses—relational invariants are often necessary during the course of the analysis to find tight bounds. Consider, for instance, the program of Fig. 1 that computes the absolute value of X. We expect the analyzer to infer that, at the end of the program, Y ∈ [0, 20]. The interval domain will find the coarser result Y ∈ [−20, 20] because it cannot exploit the information Y = X during the test Y ≤ 0. The polyhedron domain is precise enough to infer the tightest bounds, but results in a loss of efficiency. In our second example, Fig. 2, T is linearly interpolated between Y and Z, thus, we have T ∈ [0, 0.2]. Using plain interval arithmetics, one finds the coarser result T ∈ [−0.2, 0.3]. As the assignment in T is not affine, the polyhedron domain cannot perform any better. In this paper, we present symbolic enhancement techniques that can be applied to abstract domains to solve these problems and increase their robustness against program transformations. In Fig. 1, our symbolic constant propagation is able to propagate the information Y = X and discover tight bounds using only the interval domain. In Fig. 2, our linearization technique allows us to prove that T ∈ [0, 0.3] using the interval domain (this result is not optimal, but still much better than T ∈ [−0.2, 0.3]). The techniques are generic and can be applied to other domains, such as the polyhedron domain. However, the improvement varies greatly from one example to another and enhanced domains do not enjoy best abstraction functions. Thus, our techniques depend upon strategies, some of which are proposed in the article. Related Work. Our linearization can be related to affine arithmetics, a technique introduced by Vinı́cius et al. in [16] to refine interval arithmetics by taking into account existing correlations between computed quantities. Both use a symbolic form with linear properties to allow basic algebraic simplifications. The main difference is that we relate directly program variables while affine arithmetics 350 A. Miné introduces synthetic variables. This allows us to treat control flow joins and loops, and to interact with relational domains, which is not possible with affine arithmetics. Our linearization was first introduced in [13] to abstract floatingpoint arithmetics. It is presented here with some improvements—including the introduction of several strategies. Our symbolic constant propagation technique is similar to the classical constraint propagation proposed by Kildall in [11] to perform optimization. However, scalar constants are replaced with expression trees, and our goal is not to improve the efficiency but the precision of the abstract execution. It is also related to the work of Colby: he introduces, in [4], a language of transfer relations to propagate, combine and simplify, in a fully symbolic way, sequences of transfer functions. We are more modest as we do not handle disjunctions symbolically and do not try to infer symbolic loop invariants. Instead, we rely on the underlying numerical abstract domain to perform most of the semantical job. A major difference is that, while Colby’s framework statically transforms the abstract equation system to be solved by the analyzer, our framework performs this transformation on-the-fly and benefits from the information dynamically inferred by the analyzer. Overview of the Paper. The paper is organised as follows. In Sect. 2, we introduce a language—much simplified for the sake of illustration—and recall how to perform a numerical static analysis parameterized by an abstract domain. Sect. 3 then explains how symbolic expression manipulations can be soundly incorporated within the analysis. Two symbolic methods are then introduced: expression linearization, in Sect. 4, and symbolic constant propagation, in Sect. 5. Sect. 6 discusses our practical implementation within the Astrée static analyzer and presents some experimental results. We conclude in Sect. 7. 2 Framework In this section, we briefly recall the classical design of a static analyzer using the Abstract Interpretation framework by Cousot and Cousot [6, 7]. This design is specialised towards the automatic computation of numerical invariants, and thus, is parameterized by a numerical abstract domain. 2.1 Syntax of the Language For the sake of presentation, we will only consider in this article a very simplified programming language focusing on manipulating numerical variables. We suppose that a program manipulates only a fixed, finite set of n variables, def V = {V1 , . . . , Vn }, with values within a perfect mathematical set, I ∈ {Z, Q, R}. A program P ∈ P(L × inst × L) is a single control-flow graph where nodes are program points, in L, and arcs are labelled by instructions in inst. We denote by e the entry program point. As described in Fig. 3, only two types of instructions are allowed: assignments (X ← expr) and tests (expr  0 ?), where expr are numerical expressions and  is a comparison operator. In the syntax of expressions, classical numerical constants have been replaced with intervals [a, b] Symbolic Methods to Enhance the Precision of Numerical Abstract Domains expr ::= X | [a, b] | expr  expr X ∈V a ∈ I ∪ {−∞}, b ∈ I ∪ {+∞}, a ≤ b  ∈ { +, −, ×, / } inst ::= X ← expr | expr  0 ? X ∈V  ∈ { =, =, <, ≤, ≥, > } 351 Fig. 3. Syntax of our simple language with constant bounds—possibly +∞ or −∞. Such intervals correspond to a nondeterministic choice of a new value within the bounds each time the expression is evaluated. This will be key in defining the concept of expression abstraction in Sects. 3–5. Moreover, interval constants appear naturally in programs that fetch input values from an external environment, or when modeling rounding errors in floating-point computations. Affine forms play an important role in program analysis as they are easy to manipulate and appear frequently as program invariants. We enhance affine forms with the non-determinism of intervals by defining interval affine forms as  the expressions of the form: [a0 , b0 ] + k ([ak , bk ] × Vk ). 2.2 Concrete Semantics of the Language The concrete semantics of a program is the most precise mathematical expression of its behavior. Let us first define an environment as a function, in V → I, associating a value to each variable. We choose a simple invariant semantics that associates to each program point l ∈ L the set of all environments Xl ∈ P(V → I) that can hold when l is reached. Given an environment ρ ∈ (V → I), the semantics  expr (ρ) of an expression expr , shown in Fig. 4, is the set of values the expression can evaluate to. It outputs a set to account for non-determinism. When I = Z, the truncate function rounds the possibly non-integer result of the division towards an integer by truncation, as it is common in most computer languages. Divisions by zero are undefined, that is, return no result; for the sake of simplicity, we have not introduced any error state. The semantics of assignments and tests is defined by transfer functions {| inst |} : P(V → I) → P(V → I) in Fig. 4. The assignment transfer function returns environments where one variable has changed its value (ρ[V → x] denotes the function equal to ρ on  X (ρ)  [a, b] (ρ)  e1  e2 (ρ)  e1 /e2 (ρ)  e1 /e2 (ρ) def = = def = def = def = def { ρ(X) } {x∈I|a≤x≤b} { x  y | x ∈  e1 (ρ), y ∈  e2 (ρ) }  ∈ {+, −, ×} { truncate (x/y) | x ∈  e1 (ρ), y ∈  e2 (ρ), y = 0 } if I = Z { x/y | x ∈  e1 (ρ), y ∈  e2 (ρ), y = 0 } if I = Z def {| X ← e |}(R) = { ρ[X → v] | ρ ∈ R, v ∈  e (ρ) } def {| e  0 ? |}(R) = { ρ | ρ ∈ R and ∃ v ∈  e (ρ), v  0 holds } Fig. 4. Concrete semantics 352 A. Miné V \ {V } and that maps V to x). The test transfer function filters environments to keep only those that may satisfy the test. We can now define the semantics (Xl )l∈L of a program P as the smallest solution of the following equation system: ⎧ ⎨ Xe = V→ I (1) {| i |}(Xl ) when l = e X = ⎩ l (l ,i,l)∈P It describes the strongest invariant at each program point. 2.3 Abstract Interpretation and Numerical Abstract Domains The concrete semantics is very precise but cannot be computed fully automatically by a computer. We will only try to compute a sound overapproximation, that is, a superset of the environments reached by the program. We use Abstract Interpretation [6, 7] to design such an approximation. Numerical Abstract Domains. An analysis is parameterized by a numerical abstract domain that allows representing and manipulating selected subsets of environments. Formally it is defined as: – a set of computer-representable abstract elements D , – a partial order  on D to model the relative precision of abstract elements, – a monotonic concretization γ : D → P(V → I), that assigns a concrete property to each abstract element, – a greatest element  for  such that γ(  ) = (V → I),  – sound and computable abstract versions {| inst |} of all transfer functions,   – sound and computable abstractions ∪ and ∩ of ∪ and ∩, – a widening operator  if D has infinite increasing chains. The soundness condition for the abstraction F  : (D )n → D of a n−ary operator F is: F (γ(X1 ), . . . , γ(Xn )) ⊆ γ(F  (X1 , . . . , Xn )). It ensures that F  does not forget any of F ’s behaviors. It can, however, introduce spurious ones. Abstract Analysis. Given an abstract domain, an abstract version (1 ) of the equation system (1) can be derived as: ⎧   ⎪ ⎨ Xe =    (1 ) Xl  {| i |} (Xl ) when l = e ⎪ ⎩  (l ,i,l)∈P The soundness condition ensures that any solution of (1 ) satisfies ∀ l ∈ L, γ(Xl ) ⊇ Xl . The system can be solved by iterations, using a widening operator  to ensure termination. We refer the reader to Bourdoncle [2] for an in-depth description of possible iteration strategies. The computed Xl is almost never the best abstraction—if it exists—of the concrete solution Xl . Unavoidable losses of precision come from the use of convergence acceleration  , nonnecessarily best abstract transfer functions, and the fact that the composition of Symbolic Methods to Enhance the Precision of Numerical Abstract Domains 353 best abstractions is generally not a best abstraction. This last issue explains why even the simplest semantics-preserving program transformations can drastically affect the quality of a static analysis. Existing Numerical Domains. There exists many numerical abstract domains. We will be mostly interested in those able to express variable bounds. Such abstract domains include the well-known interval domain [5] (able to express invariants of  the form i Vi ∈ [ai , bi ]), and the polyhedron domain [8] (able to express affine inequalities i these two in j αij Vi ≥ βj ). More recent domains, in-between  terms of cost and precision, include the octagon domain [12] ( ±V i ±Vj ≤ cij ), ij   α V ≥ β where α ∈ {−1, 0, 1}), and the the octahedron domain [3] ( j ij i j ij i  Two Variable Per Inequality domain [15] ( i αi Vki + βi Vli ≤ ci ). 3 Incorporating Symbolic Methods We suppose that we are given a numerical abstract domain D . The gist of  our method is to replace, in the abstract transfer functions {| X ← e |} and  {| e  0 ? |} , each expression e with another one e , in a sound way. Partial Order on Expressions. To define formally the notion of sound expression abstraction, we first introduce an approximation order  on expressions. A natural choice is to consider the point-wise ordering of the concrete semantics  ·  def defined in Fig. 4, that is: e1  e2 ⇐⇒ ∀ ρ ∈ (V → I),  e1 (ρ) ⊆  e2 (ρ). However, requiring the inclusion to hold for all environments is quite restrictive. More aggressive expression transformations can be enabled by only requiring soundness with respect to selected sets of environments. Our partial order  is now defined “up to” a set of environments R ∈ P(V → I): Definition 1. R |= e1  e2 ⇐⇒ ∀ ρ ∈ R,  e1 (ρ) ⊆  e2 (ρ). def We denote by R |= e1 = e2 the associated equality relation. Sound Symbolic Transformations. We wish now to abstract some transfer function, e.g., {| V ← e |}, on an abstract environment R ∈ D . The following theorem states that, if e overapproximates e on γ(R ), it is sound to replace e with e in the abstract transfer functions: Theorem 1. If γ(R ) |= e  e , then:  • ({| V ← e |} ◦ γ)(R ) ⊆ (γ ◦ {| V ← e |} )(R ),  • ({| e  0 ? |} ◦ γ)(R ) ⊆ (γ ◦ {| e  0 ? |} )(R ). 4 Expression Linearization Our first symbolic transformation is an abstraction of arbitrary expressions into  interval affine forms i0 + k (ik × Vk ), where the i’s stand for intervals. 354 4.1 A. Miné Definitions Interval Affine Form Operators. We first introduce a few operators to manipulate interval affine forms in a symbolic way. Using the classical interval arithmetic operators—denoted with a I superscript—we can define point-wisely the addition  and subtraction  of affine forms, as well as the multiplication  and division  of an affine form by a constant interval: Definition  2.   def • (i0 + k ik × Vk )  (i0 + k ik × Vk ) = (i0 +I i0 ) + k (ik +I ik ) × Vk ,    def • (i0 + k ik × Vk )  (i0 + k ik × Vk ) = (i0 −I i0 ) + k (ik −I ik ) × Vk ,   def • i  (i0 + k ik × Vk ) = (i ×I i0 ) + k (i ×I ik ) × Vk ,   def • (i0 + k ik × Vk )  i = (i0 /I i) + k (ik /I i) × Vk . where the interval arithmetic operators are defined classically as: def def • [a, b] +I [a , b ] = [a + a , b + b ], • [a, b] −I [a , b ] = [a − b , b − a ], def • [a, b] ×I [a , b ] = [min(aa , ab , ba , bb ), max(aa , ab , ba , bb )], def • ⎧ [a, b]/I [a , b ] = if 0 ∈ [a , b ] ⎨ [−∞, +∞]         [min(a/a , a/b , b/a , b/b ), max(a/a , a/b , b/a , b/b )] when I = Z ⎩ [min(a/a , a/b , b/a , b/b ), max(a/a , a/b , b/a , b/b )] when I = Z The following theorem states that these operators are always sound and, in some cases, complete—i.e.,  can be replaced by =: Theorem 2. For all interval affine forms l1 , l2 and interval i, we have: • IV |= l1 + l2 = l1  l2 , • IV |= l1 − l2 = l1  l2 , V • I |= i × l1 = i  l1 , if I = Z, • IV |= i × l1  i  l1 , otherwise, V • I |= l1 /i = l1  i, if I = Z and 0 ∈ / i, • IV |= l1 /i  l1  i, otherwise. When I = Z, we must conservatively round upper and lower bounds respectively towards +∞ and −∞ to ensure that Thm. 2 holds. The non-exactness of the multiplication and division can then lead to some precision degradation. For instance, (X  2)  2 evaluates to [0, 2] × X as, when computing X  2, the nonintegral value 1/2 must be abstracted into the integral interval [0, 1]. One solution is to perform all computations in R, keeping in mind that, due to truncation, l/[a, b] should be interpreted when 0 ∈ / [a, b] as (l  [a, b])  [−1 + x, 1 − x], where x = 1/ min(|a|, |b|). We then obtain the more precise result X + [−1, 1]. We now introduce a so-called“intervalization”operator, ι, to abstracts interval affine forms into intervals. Given an abstract environment, it evaluates the affine form using interval arithmetics. Suppose that D provides us with projection operators πk : D → P(I) able to return an interval overapproximation for each variable Vk . We define ι as: I  def I  Definition 3. ι(i0 + k (ik × Vk ))(R ) = i0 +I k (ik × πk (R )), where   each πk (R ) is an interval containing { ρ(Vk ) | ρ ∈ γ(R ) }. Symbolic Methods to Enhance the Precision of Numerical Abstract Domains 355 The following theorem states that ι is a sound operator with respect to R : Theorem 3. γ(R ) |= l  ι(l)(R ). As πk performs a non-relational abstraction, ι incurs a loss of precision whenever D is a relational domain. Consider, for instance R such that γ(R ) = { ρ ∈ ({V1 , V2 } → [0, 1]) | ρ(V1 ) = ρ(V2 ) }. Then,  ι(V1 − V2 )(R )  is the constant function [−1, 1] while  V1 − V2  is 0. Linearization. The linearization  e (R ) of an arbitrary expression e in an abstract environment R can now be defined by structural induction as follows: Definition 4. def def •  V (R ) = [1, 1] × V , •  [a, b] (R ) = [a, b], def •  e1 + e2 (R ) =  e1 (R )   e2 (R ), def •  e1 − e2 (R ) =  e1 (R )   e2 (R ), def •  e1 /e2 (R ) =  e1 (R )  ι( e2 (R ))(R ), either ι( e1 (R ))(R )   e2 (R ) def (see Sect. 4.3) •  e1 × e2 (R ) = or ι( e2 (R ))(R )   e1 (R ) The ι operator is used to deal with non-linear constructions: the right argument of a division and either argument of a multiplication are intervalized. As a consequence of Thms. 2 and 3, our linearization is sound: Theorem 4. γ(R ) |= e   e (R ). Obviously,  ·  generally incurs a loss of precision with respect to . Also,  e  is not monotonic in its e argument. Consider for instance X/X in the environment R such that πX (R ) = [1, +∞]. Although γ(R ) |= X/X  [1, 1], we do not have γ(R ) |=  X/X (R )   [1, 1] (R ) as  X/X (R ) = [0, 1] × X. It is important to note that there is no useful notion of best abstraction of expressions for . 4.2 Integration with a Numerical Abstract Domain Given an abstract domain, D , we can now derive a new abstract domain with  , identical to D except for the following transfer functions: linearization, DL {| V ← e |}L (R ) = {| V ←  e (R ) |} (R ) def {| e  0 ? |}L (R ) = {|  e (R )  0 ? |} (R ) def The soundness of these transfer functions is guaranteed by Thms. 1 and 4. Application to the Interval Domain. As all non-relational domains, the interval domain [5], is not able to exploit the fact that the same variable occurs several times in an expression. Our linearization performs some symbolic simplification, and so, is able to partly correct this problem. Consider, for instance, the assignment {| Y ← 3 × X − X |} in an abstract environment such that X ∈ [a, b]. The I will assign regular interval domain DI will assign [3a − b, 3b − a] to Y , while DL  [2a, 2b] as  3 × X − X (R ) = 2 × X. This last answer is strictly more precise whenever a = b. Using the exactness of Thm. 2, one can prove that, when I = Z, I is always more precise than in DI . This may not be the the assignment in DL case for a test, or when I = Z. 356 A. Miné Application to the Octagon Domain. The octagon domain [12] is more precise than the interval one, but it is more complex. As a consequence, it is quite difficult to design abstract transfer functions for non-linear expressions. This problem can be solved by using our linearization in combination with the efficient and rather precise interval affine form abstract transfer functions proposed in our previous work [14]. The octagon domain with linearization is able to prove, for instance, that, after the assignment X ← T × Y in an environment such that T ∈ [−1, 1], we have −Y ≤ X ≤ Y . Application to the Polyhedron Domain. The polyhedron domain [8] is more precise than the octagon domain but cannot deal with full interval affine forms— only the constant coefficient may safely be an interval. To solve this problem, we introduce a function µ to abstract interval affine forms further by making all variable coefficients singletons. For the sake of conciseness, we give a formula valid only for I = Z and finite interval bounds: Definition 5.  def µ ([a0 , b0 ] + k [ak , bk ] × Vk ) (R ) = I  [a0 , b0 ] +I k [(ak − bk )/2, (bk − ak )/2] ×I πk (R ) + k ((ak + bk )/2) × Vk µ works by “distributing” the weight bk − ak of each variable coefficient into the constant component, using variable bounds information from R . One can prove that µ is sound, that is, γ(R ) |= l  µ(l)R . Application to Floating-Point Arithmetics. Real-life programming languages do not manipulate rationals or reals, but floating-point numbers, which are much more difficult to abstract. Pervasive rounding must be taken into account. As most classical properties of arithmetic operators are no longer true, it is generally not safe to feed floating-point expressions to relational domains. One solution is to convert such expressions into real-valued expressions by making rounding explicit. Rounding is highly non-linear but can be abstracted using intervals. For instance, X + Y in the floating-point world can be abstracted into [1 − 1 , 1 + 1 ] × X + [1 − 1 , 1 + 1 ] × Y + [− 2 , 2 ] using small constants 1 and 2 modeling, respectively, relative and absolute errors. This fits in our linearization framework which can be extended to treat soundly floating-point arithmetics. We refer the reader to related work [13] for more information. 4.3 Multiplication Strategies When encountering a multiplication e1 × e2 and neither  e1 (R ) nor  e2 (R ) evaluates to an interval, we must intervalize either argument. Both choices are valid, but influence greatly the precision of the result. All-Cases Strategy. A first idea is to try both choices for each multiplication; we get a set of linearized expressions. We have no notion of greatest lower bound on expressions, so, we must evaluate a transfer function for all expressions in parallel, and take the intersection ∩ of the resulting abstract elements in D . Symbolic Methods to Enhance the Precision of Numerical Abstract Domains 357 Unfortunately, the cost is exponential in the number of multiplications in the original expression, hence the need for deterministic strategies that always select one interval affine form. Interval-Size Strategy. A simple strategy is to intervalize the affine form that will yield the narrower interval. This greedy approach tries to limit the amplitude of the non-determinism introduced by multiplications. The extreme case holds when the amplitude of one interval is zero, meaning that the sub-expression is semantically a constant; intervalizing it will not result in any precision loss. Finally, note that the relative amplitude (b − a)/|a + b| may be more significant than the absolute amplitude b − a if we want to intervalize preferably expressions that are constant up to some small relative rounding error. Simplification-Driven Strategy. Another idea is to maximize the amount of simplification by not intervalizing, when possible, sub-expressions containing variables appearing in other sub-expressions. For instance, in X − (Y × X), Y will be intervalized to yield [1 − max Y, 1 − min Y ] × X. Unlike the preceding greedy approach, this strategy is global and treats the expression as a whole. Homogeneity Strategy. We now consider the linear interpolation of Fig. 2. In order to achieve the best precision, it is important to intervalize X in both multiplications. This yields T ← [0, 1] × Y + [0, 1] × Z and we are able to prove that T ≥ 0—however, we find that T ≤ 0.3 while in fact T ≤ 0.2. The interval-size strategy would choose to intervalize Y and Z that have smaller range than X, which yields the imprecise assignment T ← [−0.2, 0.1] × X + [0, 0.2]. Likewise, the simplification-driven strategy may choose to keep X that appears in two sub-expressions and also intervalize both Y and Z. To solve this problem, we propose to intervalize the smallest set of variables that makes the expression homogeneous, that is, arguments of + and − operators should have the same degree. In order to make the (1 − X) sub-expression homogeneous, X is intervalized. This last strategy is quite robust: it keeps working if we change the assignment into the equivalent T ← X × Y − X × Z + Z, or if we consider bi-linear interpolations or interpolations with normalization coefficients. 4.4 Concluding Remark Our linearization is not equivalent to a static program transformation. To cope with non-linearity as best as we can, we exploit the information dynamically inferred by the analysis: first, in the intervalization ι, then, in the multiplication strategy. Both algorithms take as argument the current numerical abstract environment R . As, dually, the linearization improves the precision of the next computed abstract element, the dynamic nature of our approach ensures a positive feed-back. 5 Symbolic Constant Propagation The automatic symbolic simplification implied by our linearization allows us to gain much precision when dealing with complex expressions, without the burden 358 A. Miné of designing new abstract domains tailored for them. However, the analysis is still sensitive to program transformations that decompose expressions and introduce new temporary variables—such as common sub-expression elimination or register spilling. In order to be immune to this problem, one must generally use an expressive, and so, costly, relational domain. We propose an alternate, lightweight solution based on a kind of constant domain that tracks assignments dynamically and propagate symbolic expressions within transfer functions. 5.1 The Symbolic Constant Domain Enriched Expressions. We denote by C the set of all syntactic expressions, enriched with one element C denoting ‘any value.’ The flat ordering C is defined as X C Y ⇐⇒ Y = C or X = Y . The concrete semantics  ·  of Fig. 4 is extended to C as  C (ρ) = I. We also use two functions on expression trees: occ : C → P(V) that returns the set of variables occurring in an expressing, and subst : C × V × C → C that substitutes, in its first argument, every occurrence of a given variable by its last argument. Their definition on non− C elements is quite standard and we do not present it here. They are extended to C as foldef lows: occ( C ) = ∅, subst(e, V, C ) equals e when V ∈ / occ(e) and C when V ∈ occ(e). Abstract Symbolic Environments. The symbolic constant domain is the set def DC = V → C restricted as follows: there must be no cyclic dependencies in a map S C ∈ DC , that is, pair-wise distinct variables V1 , . . . , Vn such that ∀i, Vi ∈ occ(S C (Vi+1 )) and Vn ∈ occ(S C (V1 )). The partial order C on DC is the point-wise extension of that on C. Each element S C ∈ DC represents the set of environments compatible with the symbolic information: Definition 6. γ C (S C ) = { ρ ∈ (V → I) | ∀k, ρ(Vk ) ∈  S C (Vk ) (ρ) }. def Main Theorem. Our approach relies on the fact that applying a substitution from S C to any expression is sound with respect to γ C (S C ): Theorem 5. ∀e, V, S C , γ C (S C ) |= e  subst(e, V, S C (V )). Abstract Operators. We now define the following operators on DC : Definition 7.  subst(e, V, S C (V )) if V = Vk def C C • {| V ← e |} (S )(Vk ) = subst(S C (Vk ), V, S C (V )) if V = Vk def C • {| e  0 ? |} (S C ) = S C , S C (Vk ) if S C (Vk ) = T C (Vk ) def • (S C ∪C T C )(Vk ) = C otherwise def • SC ∩ T C = SC Our assignment V ← e first substitutes V with S C (V ) in S C and e before adding the information that V maps to the substituted e. This is necessary to remove all prior information on V (no longer valid after the assignment) and prevent the apparition of dependency cycles. As we are only interested in propagating Symbolic Methods to Enhance the Precision of Numerical Abstract Domains 359 assignments, tests are abstracted as the identity, which is sound but coarse. Our union abstraction only keeps syntactically equal expressions. This corresponds to the least upper bound with respect to C . Our intersection keeps only the information of the left argument. All these operators respect the non-cyclicity condition. Note that one could be tempted to refine the intersection by mixing information from the left and right arguments in order to minimize the number of variables mapping to C . Unfortunately, careless mixing may break the noncyclicity condition. We settled, as a simpler but safe solution, to keeping the left argument. Finally, we do not need any widening: at each abstract iteration, unstable symbolic expressions are directly replaced with C when applying ∪C , and so, become stable. 5.2 Integration with a Numerical Abstract Domain Given a numerical abstract domain D , the domain D×C is obtained by com with DC the following way: bining DL Definition 8. def • D×C = D × DC , def • ×C , ∪×C and ∩×C are defined pair-wise, and ×C =  × ∪C , def • γ ×C (R , S C ) = γ  (R ) ∩ γ C (S C ), def ×C  C • {| V ← e |} (R , S C ) = ({| V ← strat(e, S C ) |}L (R ), {| V ← e |} (S C )) def ×C  C • {| e  0 ? |} (R , S C ) = ({| strat(e, S C )  0 ? |}L (R ), {| e  0 ? |} (S C )) Where strat(e, S C ) is a substitution strategy that may perform sequences of substitutions of the form f → subst(f, V, S C (V )) in e, for any variables V . All information in DC and D are computed independently, except that the  symbolic information is used in the transfer functions for DL . The next section discusses the choice of a strategy strat. Note that, although we chose in this presentation to abstract the semantics of Fig. 4, our construction can be used on any class of expressions, including floating-point and non-numerical expressions. 5.3 Substitution Strategies Any sequence of substitutions extracted from the current symbolic constant information is sound, but some give better results than others. As for the intervalization of Sect. 4.3, we rely on carefully designed strategies. Full Propagation. Thanks to the non-cyclicity of elements S C ∈ DC , we can safely perform all substitutions f → subst(f, V, S C (V )) for all V in any order, and reach a normal form. This gives a first basic substitution strategy. However, because our goal is to perform linearization-driven simplifications, it is important to avoid substituting with variable-free expressions or we may lose correlations between multiple occurrences of variables. For instance, full substitution in the assignment Z ← X − 0.5 × Y with the environment S C = [X → [0, 1], Y → X] results in Z ← [0, 1] − 0.5 × [0, 1], and so, Z ∈ [−0.5, 1]. Avoiding variable-free substitutions, this gives Z ← X − 0.5 × X, and so, Z ∈ [0, 0.5], which is more 360 A. Miné precise. This refined strategy also succeeds in proving that Y ∈ [0, 20] in the example of Fig. 1 by substituting Y with X in the test Y ≤ 0. Enforcing Determinism and Linearity. Non-determinism in expressions is a major source of precision loss. Thus, a strategy is to avoid substituting V with S C (V ) whenever #( S C (V )  ◦ γ)(X  ) > 1. As this property is not easily computed, we propose the following sufficient syntactic criterion: S C (V ) should not be C nor contain a non-singleton interval. This strategy gives the expected result in the example of Fig. 1. Likewise, one may wish to avoid substituting with non-linear expressions, as they must be subsequently intervalized, which is a cause of precision loss. However, disabling too many substitutions may prevent the linearization step to exploit correlations. Suppose that we break the last assignment of Fig. 2 in three parts: U ← X × Y ; V ← (1 − X) × Z; T ← U − V . Then, the interval domain with linearization and symbolic constant propagation will not be able to prove that T ∈ [0, 0.3] unless we allow substituting, in T , U and V with their non-linear symbolic value. Gaining More Precision. More precision can be achieved by slightly altering the definition of D×C . A simple but effective idea is to allow several strategies, compute several transfer functions in D in parallel, and take the abstract intersection ∩ of the results. Another idea is to perform reductions from DC to D  after each transfer function: X  is replaced with {| Vk − S C (Vk ) = 0 ? |} (X  ) for some k. Reductions can be iterated to increase the precision, following Granger’s local iterations scheme [10]. 6 Application to the Astrée Analyzer Astrée is an efficient static analyzer focusing on the detection of run-time errors for programs written in a subset of the C programming language, excluding recursion, dynamic memory allocation and concurrent executions. It aims towards a degree of precision sufficient to actually prove the absence of run-time errors. This is achieved by specializing the analyzer towards specific program families, introducing various abstract domains, and setting iteration strategy parameters. Currently, the considered family of programs is that of safety, critical, embedded, fly-by-wire avionic software, featuring large reactive loops running for billions of iterations, thousands of global state variables, and pervasive floating-point arithmetics. We refer the reader to [1] for more detailed informations on Astrée. Integrating the Symbolic Methods. Astrée uses a partially reduced product of several numerical abstract domains, together with both our two symbolic enhancement methods. Relational domains, such as the octagon [12] or digital filtering [9] domains, rely on the linearization to abstract complex floating-point expressions into interval affine forms on reals. The interval domain is refined by combining three versions of each transfer function. Firstly, using the expression unchanged. Secondly, using the linearized expression. Thirdly, applying symbolic constant propagation followed by linearization. We use the simplification-driven Symbolic Methods to Enhance the Precision of Numerical Abstract Domains 361 multiplication strategy, as well as the full propagation strategy—not propagating variable-free expressions. Experimental Results. We present analysis results on a several programs. All the analyses have been carried on an 64-bit AMD Opteron 248 (2 GHz) workstation running Linux, using a single processor. The following table compares the precision and efficiency of Astrée before and after enabling our two symbolic methods: without enhancements code size analysis nb. of memory alarms in lines time iters. 370 1.8s 17 16 MB 0 9 500 90s 39 80 MB 8 70 000 2h 40mn 141 559 MB 391 226 000 11h 16mn 150 1.3 GB 141 400 000 22h 9mn 172 2.2 GB 282 with enhancements analysis nb. of memory alarms time iters. 3.1s 17 16 MB 0 160s 39 81 MB 8 1h 16mn 44 582 MB 0 6h 36mn 86 1.3 GB 1 13h 52mn 96 2.2 GB 0 The precision gain is quite impressive as up to hundreds of alarms are removed. In two cases, this increase in precision is sufficient to achieve zero alarm, which actually proves the absence of run-time errors. Moreover, the increase in memory consumption is negligible. Finally, in our largest examples, our enhancement methods save analysis time: although each abstract iteration is more costly (up to 25%) this is compensated by the reduced number of iterations needed to stabilize our invariants as a smaller state space is explored. Discussion. It is possible to use the symbolic constant propagation also in relational domains, but this was not needed in our examples to remove alarms. Our experiments show that, even though the linearization and constant propagation techniques on intervals are not as robust as fully relational abstract domains, they are quite versatile thanks to their parametrization in terms of strategies, and much simpler to implement than even a simple relational abstract domain. Moreover, our methods exhibit a near-linear time and memory cost, which is much more efficient than relational domains. 7 Conclusion We have proposed, in this article, two techniques, called linearization and symbolic constant propagation, that can be combined together to improve the precision of numerical abstract domains. In particular, we are able to compensate for the lack of non-linear transfer functions in the polyhedron and octagon domains, and for a weak or inexistent level of relationality in the octagon and interval domains. Finally, they help making abstract domains robust against program transformations. Thanks to their parameterization in terms of strategies, they can be finely tuned to take into account semantics as well as syntactic program features. They are also very lightweight in terms of both analysis and development costs. We found out that, in many cases, it is easier and faster to design a 362 A. Miné couple of linearization and symbolic propagation strategies to solve a local loss of precision in some program, while keeping the interval abstract domain, than to develop a specific relational abstract domain able to represent the required local properties. Strategies also proved reusable on programs belonging to the same family. Practical results obtained within the Astrée static analyzer show that our methods both increase the precision and save analysis time. They were key in proving the absence of run-time errors in real-life critical embedded avionics software. Future Work. Because the precision gain strongly depends upon the multiplication strategy used in our linearization and the propagation strategy used in the symbolic constant domain, a natural extension of our work is to try and design new such strategies, adapted to different practical cases. A more challenging task would be to provide theoretical guarantees that some strategies make abstract domains immune to given classes of program transformations. Acknowledgments. We would like to thank all the former and present members of the Astrée team: B. Blanchet, P. Cousot, R. Cousot, J. Feret, L. Mauborgne, D. Monniaux and X. Rival. We would also like to thank the anonymous referees for their useful comments. References [1] B. Blanchet, P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Miné, D. Monniaux, and X. Rival. A static analyzer for large safety-critical software. In ACM PLDI’03, volume 548030, pages 196–207. ACM Press, 2003. [2] F. Bourdoncle. Efficient chaotic iteration strategies with widenings. In FMPA’93, volume 735 of LNCS, pages 128–14. Springer, 1993. [3] R. Clarisó and J. Cortadella. The octahedron abstract domain. In SAS’04, volume 3148 of LNCS, pages 312–327. Springer, 2004. [4] C. Colby. Semantics-Based Program Analysis via Symbolic Composition of Transfer Relations. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA, 1996. [5] P. Cousot and R. Cousot. Static determination of dynamic properties of programs. In ISOP’76, pages 106–130. Dunod, Paris, France, 1976. [6] P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In ACM POPL’77, pages 238–252. ACM Press, 1977. [7] P. Cousot and R. Cousot. Abstract interpretation and application to logic programs. Journal of Logic Programming, 13(2–3):103–179, 1992. [8] P. Cousot and N. Halbwachs. Automatic discovery of linear restraints among variables of a program. In ACM POPL’78, pages 84–97. ACM Press, 1978. [9] J. Feret. Static analysis of digital filters. In ESOP’04, volume 2986 of LNCS. Springer, 2004. [10] P. Granger. Improving the results of static analyses programs by local decreasing iteration. In FSTTCS, volume 652 of LNCS, pages 68–79. Springer, 1992. [11] G. Kildall. A unified approach to global program optimization. In ACM POPL’73, pages 194–206. ACM Press, 1973. Symbolic Methods to Enhance the Precision of Numerical Abstract Domains 363 [12] A. Miné. The octagon abstract domain. In AST 2001 in WCRE 2001, IEEE, pages 310–319. IEEE CS Press, 2001. [13] A. Miné. Relational abstract domains for the detection of floating-point run-time errors. In ESOP’04, volume 2986 of LNCS, pages 3–17. Springer, 2004. [14] A. Miné. Weakly Relational Numerical Abstract Domains. PhD thesis, École Polytechnique, Palaiseau, France, dec 2004. [15] A. Simon, A. King, and J. Howe. Two variables per linear inequality as an abstract domain. In LOPSTR’02, volume 2664 of LNCS, pages 71–89. Springer, 2002. [16] M. Vinı́cius, A. Andrade, J. L. D. Comba, and J. Stolfi. Affine arithmetic. In INTERVAL’94, 1994. Synthesis of Reactive(1) Designs Nir Piterman1 , Amir Pnueli2 , and Yaniv Sa’ar3 1 2 EPFL - I&C - MTC, 1015, Lausanne, Switzerland firstname.lastname@epfl.ch Department of Computer Science, Weizmann Institute of Science, Rehovot, 76100, Israel firstname.lastname@weizmann.ac.il 3 Department of Computer Science, Ben Gurion University, Beer-Sheva, Israel saary@cs.bgu.ac.il Abstract. We consider the problem of synthesizing digital designs from their LTL specification. In spite of the theoretical double exponential lower bound for the general case, we show that for many expressive specifications of hardware designs the problem can be solved in time N 3 , where N is the size of the state space of the design. We describe the context of the problem, as part of the Prosyd European Project which aims to provide a property-based development flow for hardware designs. Within this project, synthesis plays an important role, first in order to check whether a given specification is realizable, and then for synthesizing part of the developed system. The class of LTL formulas considered is that of Generalized Reactivity(1) (generalized Streett(1)) formulas, i.e., formulas of the form: (  p1 ∧ · · · ∧  pm ) → (  q1 ∧ · · · ∧  qn ) where each pi , qi is a boolean combination of atomic propositions. We also consider the more general case in which each pi , qi is an arbitrary past LTL formula over atomic propositions. For this class of formulas, we present an N 3 -time algorithm which checks whether such a formula is realizable, i.e., there exists a circuit which satisfies the formula under any set of inputs provided by the environment. In the case that the specification is realizable, the algorithm proceeds to construct an automaton which represents one of the possible implementing circuits. The automaton is computed and presented symbolically. 1 Introduction One of the most ambitious and challenging problems in reactive systems construction is the automatic synthesis of programs and (digital) designs from logical specifications. First identified as Church’s problem [Chu63], several methods have been proposed for its solution ([BL69], [Rab72]). The two prevalent approaches to solving the synthesis problem were by reducing it to the emptiness problem of tree automata, and viewing it as the solution of a two-person game. In these preliminary studies of the problem,  This research was supported in part by the Israel Science Foundation (grant no.106/02-1), European community project Prosyd, the John von-Neumann Minerva center for Verification of Reactive Systems, NSF grant CCR-0205571, ONR grant N00014-99-1-0131, and SRC grant 2004-TJ-1256. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 364–380, 2006. c Springer-Verlag Berlin Heidelberg 2006  Synthesis of Reactive(1) Designs 365 the logical specification that the synthesized system should satisfy was given as an S1S formula. This problem has been considered again in [PR89a] in the context of synthesizing reactive modules from a specification given in Linear Temporal Logic ( LTL). This followed two previous attempts ([CE81], [MW84]) to synthesize programs from temporal specification which reduced the synthesis problem to satisfiability, ignoring the fact that the environment should be treated as an adversary. The method proposed in [PR89a] for a given LTL specification ϕ starts by constructing a Büchi automaton Bϕ , which is then determinized into a deterministic Rabin automaton. This double translation may reach complexity of double exponent in the size of ϕ. Once the Rabin automaton is obtained, the game can be solved in time nO(k) , where n is the number of states of the automaton and k is the number of accepting pairs. The high complexity established in [PR89a] caused the synthesis process to be identified as hopelessly intractable and discouraged many practitioners from ever attempting to use it for any sizeable system development. Yet there exist several interesting cases where, if the specification of the design to be synthesized is restricted to simpler automata or partial fragments of LTL, it has been shown that the synthesis problem can be solved in polynomial time. Representative cases are the work in [AMPS98] which presents (besides the generalization to real time) efficient polynomial solutions (N 2 ) to games (and hence synthesis problems) where the acceptance condition is one of the LTL formulas p,  q,  p, or  q. A more recent paper is [AT04] which presents efficient synthesis approaches for the LTL fragment consisting of a boolean combinations of formulas of the form p. This paper can be viewed as a generalization of the results of [AMPS98] and [AT04] into the wider class of generalized Reactivity(1) formulas (GR(1)), i.e. formulas of the form (  p1 ∧ · · · ∧  pm ) → (  q1 ∧ · · · ∧  qn ) (1) Following the developments in [KPP05], we show how any synthesis problem whose specification is a GR(1) formula can be solved in time N 3 , where N is the size of the state space of the design. Furthermore, we present a (symbolic) algorithm for extracting a design (program) which implements the specification. We make an argument that the class of GR(1) formulas is sufficiently expressive to provide complete specifications of many designs. This work has been developed as part of the Prosyd project (see www.prosyd.org) which aims at the development of a methodology and a tool suit for the property-based construction of digital circuits from their temporal specification. Within the prosyd project, synthesis techniques are applied to check first whether a set of properties is realizable, and then to automatically produce digital designs of smaller units. 2 Preliminaries 2.1 Linear Temporal Logic We assume a countable set of Boolean variables (propositions) V. LTL formulas are constructed as follows. ϕ ::= p | ¬ϕ | ϕ ∨ ϕ |  ϕ | ϕUϕ 366 N. Piterman, A. Pnueli, and Y. Sa’ar As usual we denote ¬(¬ϕ ∨ ¬ψ) by ϕ ∧ ψ, T U ϕ by  ϕ and ¬  ¬ϕ by ϕ. A formula that does not include temporal operators is a Boolean formula. A model σ for a formula ϕ is an infinite sequence of truth assignments to propositions. Namely, if P  is the set of propositions appearing in ϕ, then for every finite set P such that P  ⊆ P , a word in (2P )ω is a model. We denote by σ(i) the set of propositions at position i, that is σ = σ(0), σ(1), . . .. We present an inductive definition of when a formula holds in model σ at position i. – – – – – For p ∈ P we have σ, i |= p iff p ∈ σ(i). σ, i |= ¬ϕ iff σ, i |= ϕ σ, i |= ϕ ∨ ψ iff σ, i |= ϕ or σ, i |= ψ σ, i |=  ϕ iff σ, i + 1 |= ϕ σ, i |= ϕ U ψ iff there exists k ≥ i such that σ, k |= ψ and σ, j |= ϕ for all j, i≤j<k For a formula ϕ and a position j ≥ 0 such that σ, j |= ϕ, we say that ϕ holds at position j of σ. If σ, 0 |= ϕ we say that ϕ holds on σ and denote it by σ |= ϕ. A set of models L satisfies ϕ, denoted L |= ϕ, if every model in L satisfies ϕ. We are interested in the question of realizability of LTL specifications [PR89b]. Assume two sets of variables X and Y. Intuitively X is the set of input variables controlled by the environment and Y is the set of system variables. With no loss of generality, we assume that all variables are Boolean. Obviously, the more general case that X and Y range over arbitrary finite domains can be reduced to the Boolean case. Realizability amounts to checking whether there exists an open controller that satisfies the specification. Such a controller can be represented as an automaton which, at any step, inputs values of the X variables and outputs values for the Y variables. Below we formalize the notion of checking realizability and synthesis, namely, the construction of such controllers. Realizability for LTL specifications is 2EXPTIME-complete [PR90]. We are interested in a subset of LTL for which we solve realizability and synthesis in polynomial time. The specifications we consider are of the form ϕ = ϕe → ϕs . We require that ϕα for α ∈ {e, s} can be rewritten as a conjunction of the following parts. – ϕα i - a Boolean formula which  characterizes the initial states of the implementation. a formula of the form Bi where each Bi is a Boolean combination of – ϕα t i∈I variables from X ∪ Y and expressions of the form  v where v ∈ X if α = e, and v ∈ X ∪ Y otherwise.   Bi where each Bi is a Boolean formula. – ϕα g - a formula of the form i∈I It turns out that most of the specifications written in practice can be rewritten to this format1 . In Section 7 we discuss also cases where the formulas ϕα g have also sub-formulas of the form (p →  q) where p and q are Boolean formulas, and additional cases which can be converted to the GR(1) format. 1 In practice, the specification is usually given in this format. The specification is a collection of assumptions and requirements with the semantics that all assumptions imply all requirements. Every assumption or requirement is usually of a very simple formula similar to the required form. Synthesis of Reactive(1) Designs 367 2.2 Game Structures We reduce the realizability problem of an LTL formula to the decision of winner in games. We consider two-player games played between a system and an environment. The goal of the system is to satisfy the specification regardless of the actions of the environment. Formally, we have the following. A game structure (GS) G : V, X , Y, Θ, ρe , ρs , ϕ consists of the following components. • V = {u1 , . . . , un } : A finite set of typed state variables over finite domains. With no loss of generality, we assume they are all Boolean. We define a state s to be an interpretation of V , assigning to each variable u ∈ V a value s[u] ∈ {0, 1}. We denote by Σ the set of all states. We extend the evaluation function s[·] to Boolean expressions over V in the usual way. An assertion is a Boolean formula over V . A state s satisfies an assertion ϕ denoted s |= ϕ, if s[ϕ] = true. We say that s is a ϕ-state if s |= ϕ. • X ⊆ V is a set of input variables. These are variables controlled by the environment. Let DX denote the possible valuations to variables in X . • Y = V \X is a set of output variables. These are variables controlled by the system. Let DY denote the possible valuations for the variables in Y. • Θ is the initial condition. This is an assertion characterizing all the initial states of G. A state is called initial if it satisfies Θ. • ρe (X , Y, X  ) is the transition relation of the environment. This is an assertion, relating a state s ∈ Σ to a possible next input value ξ  ∈ DX , by referring to unprimed copies of X and Y and primed copies of X . The transition relation ρe identifies valuation ξ ∈ DX as a possible input in state s if (s, ξ  ) |= ρe (X , Y, X  ) where (s, ξ  ) is the joint interpretation which interprets u ∈ V as s[u] and for v ∈ X interprets v  as ξ [v]. • ρs (X , Y, X  , Y  ) is the transition relation of the system. This is an assertion, relating a state s ∈ Σ and an input value ξ ∈ DX to a next output value η ∈ DY , by referring to primed and unprimed copies of V . The transition relation ρs identifies a valuation η  ∈ DY as a possible output in state s reading input ξ  if (s, ξ  , η  ) |= ρs (V, V  ) where (s, ξ  , η  ) is the joint interpretation which interprets u ∈ X as s[u], u as ξ [u], and similarly for v ∈ Y. • ϕ is the winning condition, given by an LTL formula. For two states s and s of G, s is a successor of s if (s, s ) |= ρe ∧ ρs . We freely switch between (s, ξ ) |= ρe and ρe (s, ξ  ) = 1 and similarly for ρs . A play σ of G is a maximal sequence of states σ : s0 , s1 , . . . satisfying initiality namely s0 |= Θ, and consecution namely, for each j ≥ 0, sj+1 is a successor of sj . Let G be an GS and σ be a play of G. From a state s, the environment chooses an input ξ  ∈ DX such that ρe (s, ξ ) = 1 and the system chooses an output η  ∈ DY such that ρs (s, ξ  , η  ) = ρs (s, s ) = 1. A play σ is winning for the system if it is infinite and it satisfies ϕ. Otherwise, σ is winning for the environment. A strategy for the system is a partial function f : Σ + × DX → DY such that if σ = s0 , . . . sn then for every ξ  ∈ DX such that ρe (sn , ξ ) = 1 we have ρs (sn , ξ  , f (σ, ξ  )) = 1. Let f be a strategy for the system, and s0 ∈ Σ. A play s0 , s1 , . . . is said to be compliant with strategy f if for all i ≥ 0 we have f (s0 , . . . , si , si+1 [X ]) = si+1 [Y], where 368 N. Piterman, A. Pnueli, and Y. Sa’ar si+1 [X ] and si+1 [Y] are the restrictions of si+1 to variable sets X and Y, respectively. Strategy f is winning for the system from state s ∈ ΣG if all s-plays (plays departing from s) which are compliant with f are winning for the system. We denote by Ws the set of states from which there exists a winning strategy for the system. A strategy for player environment, winning strategy, and the winning set We are defined dually. A GS G is said to be winning for the system if all initial states are winning for the system. Given an LTL specification ϕe → ϕs as explained above and sets of input and output α α variables X and Y we construct a GS as follows. Let ϕα = ϕα i ∧ϕ  t ∧ϕg for α ∈ {e, s}. e s α Bi , then ρα = i∈I τ (Bi ), where the Then, for Θ we take ϕi ∧ ϕi . Let ϕt = i∈I translation τ replaces each instance of  v by v  . Finally, we set ϕ = ϕeg → ϕsg . We solve the game, attempting to decide whether the game is winning for the environment or the system. If the environment is winning the specification is unrealizable. If the system is winning, we synthesize a winning strategy which is a working implementation for the system as explained in Section 4. 2.3 Fair Discrete Systems We present implementations as a special case of fair discrete systems (FDS) [KP00]. An FDS D : V, Θ, ρ, J , C consists of the following components. • V = {u1 , ..., un } : A finite set of Boolean variables. We define a state s to be an interpretation of V . Denote by Σ the set of all states. Assertions over V and satisfaction of assertions are defined like in games. • Θ : The initial condition. This is an assertion characterizing all the initial states of the FDS. A state is called initial if it satisfies Θ. • ρ : A transition relation. This is an assertion ρ(V, V  ), relating a state s ∈ Σ to its D-successor s ∈ Σ. • J = {J1 , . . . , Jm } : A set of justice requirements (weak fairness). Each requirement J ∈ J is an assertion which is intended to hold infinitely many times in every computation. • C = {(p1 , q1 ), . . . , (pn , qn )} : A set of compassion requirements (strong fairness). Each requirement (p, q) ∈ C consists of a pair of assertions, such that if a computation contains infinitely many p-states, it should also hold infinitely many q-states. We define a run of the FDS D to be a maximal sequence of states σ : s0 , s1 , ..., satisfying the requirements of • Initiality: s0 is initial, i.e., s0 |= Θ. • Consecution: For every j ≥ 0, the state sj+1 is a D-successor of the state sj . The sequence σ being maximal means that either σ is infinite, or σ = s0 , . . . , sk and sk has no D-successor. A run σ is defined to be a computation of D if it is infinite and satisfies the following additional requirements: • Justice: For each J ∈ J , σ contains infinitely many J-positions, i.e. positions j ≥ 0, such that sj |= J. • Compassion: For each (p, q) ∈ C, if σ contains infinitely many p-positions, it must also contain infinitely many q-positions. Synthesis of Reactive(1) Designs 369 We say that an FDS D implements specification ϕ if every run of D is infinite, and every computation of D satisfies ϕ. An FDS is said to be fairness-free if J = C = ∅. It is called a just transition system (JDS) if C = ∅. In general, we use FDS’s in order to formalize reactive systems. When we formalize concurrent systems which communicate by shared variables as well as most digital designs, the ensuing formal model is that of a JDS (i.e., compassion-free). Compassion is needed only in the case that the program uses built-in synchronization constructs such as semaphores or synchronous communication. For every FDS, there exists an LTL formula ϕD , called the temporal semantics of D which fully characterizes the computations of D. It can be written as:   ϕD : Θ ∧ (ρ(V,  V )) ∧  J ∧  q) (  p→ J∈J (p,q)∈C where ρ(V,  V ) is the formula obtained from ρ(V, V  ) by replacing each instance of primed variable x by the LTL formula  x. Note that in the case that D is compassion-free (i.e., it is a JDS), then its temporal semantics has the form  (ρ(V,  V )) ∧  J ϕD : Θ ∧ J∈J It follows that the class of specifications we consider in this paper, as explained at the end of Subsection 2.1, have the form ϕ = ϕe → ϕs where each ϕα , for α ∈ {e, s}, is the temporal semantics of an JDS. Thus, if the specification can be realized by an environment which is a JDS and a system which is a JDS (in particular, if none of them requires compassion for their implementation), then the class of specifications we consider here are as general as necessary. Note in particular, that hardware designs rarely assume compassion (strong fairness) as a built-in construct. Thus, we expect most specifications to be realized by hardware designs to fall in the class of GR(1). 3 µ-Calculus and Games In [KPP05], we consider the case of GR(1) games (called there generalized Streett(1) games). In these games the winning condition is an implication between conjunctions of recurrence formulas (  ϕ where ϕ is a Boolean formula). These are exactly the types of goals in the games we defined in Section 2. We show how to solve such games in cubic time [KPP05]. We re-explain here how to compute the winning regions of each of the players and explain how to use the algorithm to extract a winning strategy. We start with a definition of µ-calculus over game structures. We give the µ-calculus formula that characterizes the set of winning states of the system. We explain how we construct from this µ-calculus formula an algorithm to compute the set of winning states. Finally, by saving intermediate values in the computation, we can construct a winning strategy and synthesize an FDS that implements the goal. 3.1 µ-Calculus over Games Structures We define µ-calculus [Koz83] over game structures. Let G: V, X , Y, Θ, ρe , ρs , ϕ be a GS . For every variable v ∈ V the formulas v and ¬v are atomic formulas. Let V ar = 370 N. Piterman, A. Pnueli, and Y. Sa’ar {X, Y, . . .} be a set of relational variables. The µ-calculus formulas are constructed as follows. ϕ ::= v | ¬v | X | ϕ ∨ ϕ | ϕ ∧ ϕ |  ϕ|  ϕ | µXϕ | νXϕ A formula ψ is interpreted as the set of G-states in Σ in which ψ is true. We write such set of states as [[ψ]]eG where G is the GS and e : V ar → 2Σ is an environment. The environment assigns to each relational variable a subset of Σ. We denote by e[X ← S] the environment such that e[X ← S](X) = S and e[X ← S](Y ) = e(Y ) for Y = X. The set [[ψ]]eG is defined inductively as follows2 . [[v]]eG = {s ∈ Σ | s[v] = 1} [[¬v]]eG = {s ∈ Σ | s[v] = 0} [[X]]eG = e(X) [[ϕ ∨ ψ]]eG = [[ϕ]]eG ∪ [[ψ]]eG [[ϕ ∧ ψ]]eG =[[ϕ]]eG ∩ [[ψ]]eG   ∀x , (s, x ) |= ρe → ∃y  such that (s, x , y  ) |= ρs • [[ ϕ]]eG = s ∈ Σ  and (x , y  ) ∈ [[ϕ]]eG • • • • • A state s is included in [[ ϕ]]eG if the system can force the play to reach a state in [[ϕ]]eG . That is, regardless of how the environment moves from s, the system can choose an appropriate into [[ϕ]]eG . move     ∃x such that (s, x ) |= ρe and e  • [[ ϕ]]G = s ∈ Σ  ∀y  , (s, x , y  ) |= ρs → (x , y  ) ∈ [[ϕ]]eG A state s is included in [[ ϕ]]eG if the environment can force the play to reach a state in [[ϕ]]eG . As the environment moves first, it chooses an input x ∈ X such that for all choices of the system the successor s is in [[ϕ]]eG . e[X←Si ] • [[µXϕ]]eG = ∪i Si where S0 = ∅ and Si+1 = [[ϕ]]G e[X←Si ] • [[νXϕ]]eG = ∩i Si where S0 = Σ and Si+1 = [[ϕ]]G When all the variables in ϕ are bound by either µ or ν the initial environment is not important and we simply write [[ϕ]]G . In case that G is clear from the context we write [[ϕ]]. The alternation depth of a formula is the number of alternations in the nesting of least and greatest fixpoints. A µ-calculus formula defines a symbolic algorithm for computing [[ϕ]] [EL86]. For a µ-calculus formula of alternation depth k, the run time of this algorithm is O(|Σ|k ). For a full exposition of µ-calculus we refer the reader to [Eme97]. We often abuse notations and write a µ-calculus formula ϕ instead of the set [[ϕ]]. In some cases, instead of using a very complex formula, it may be more readable to use vector notation as in Equation (2) below.    µY ( Y ∨ p ∧  Z2 ) Z1 (2) ϕ=ν Z2 µY ( Y ∨ q ∧  Z1 ) Such a formula, may be viewed as the mutual fixpoint of the variables Z1 and Z2 or equivalently as an equal formula where a single variable Z replaces both Z1 and Z2 and ranges over pairs of states [Lic91]. The formula above characterizes the set of states from 2 Only for finite game structures. Synthesis of Reactive(1) Designs 371 which system can force the game to visit p-states infinitely often and q-states infinitely often. We can characterize the same set of states by the following ‘normal’ formula3. ϕ = νZ ([µY ( Y ∨ p ∧  Z)] ∧ [µY ( Y ∨ q ∧  Z)]) . 3.2 Solving GR(1) Games Let G be a game where the winning condition is of the following form. m n   ϕ=  Ji1 →  Jj2 i=1 j=1 Here Ji1 and Jj2 are sets of Boolean formulas. In [KPP05] we term these games as generalized Streett(1) games and provide the following µ-calculus formula to solve them. Let j ⊕ 1 = (j mod n) + 1. m ⎤ ⎡  ⎡ ⎤ 2 1 Z1 ⎢ µY νX(J1 ∧  Z2 ∨  Y ∨ ¬Ji ∧  X) ⎥ ⎥ ⎢ ⎥⎢ i=1  ⎥ ⎢ ⎥⎢ m ⎢ ⎥  ⎢ Z2 ⎥⎢ ⎥ 2 1 ⎢ ⎥⎢ µY νX(J ∧  Z ∨  Y ∨ ¬J ∧  X) ⎥ 3 2 i ⎢ ⎥ ⎥ ⎢ . ⎥⎢ i=1 ⎢ ⎥ ⎥ (3) ϕ = ν⎢ ⎥ .. ⎢ .. ⎥⎢ ⎢ ⎥ . ⎢ . ⎥⎢ ⎥ ⎢ . ⎥⎢ ⎥ .. ⎢ . ⎥⎢ ⎥ ⎢ ⎥⎢ . ⎥   ⎣ ⎦⎢ m ⎥  Zn ⎣ ⎦ 2 1 νX(Jn ∧  Z1 ∨  Y ∨ ¬Ji ∧  X) µY i=1 Intuitively, for j ∈ [1..n] and i ∈ [1..m] the greatest fixpoint νX(Jj2 ∧  Zj⊕1 ∨  Y ∨ ¬Ji1 ∧  X) characterizes the set of states from which the system can force the play either to stay indefinitely in ¬Ji1 states (thus violating the left hand side of the implication) or in a finite number of steps reach a state in the set Jj2 ∧  Zj⊕1 ∨  Y . The two outer fixpoints make sure that the system wins from the set Jj2 ∧  Zj⊕1 ∨  Y . The least fixpoint µY makes sure that the unconstrained phase of a play represented by the disjunct  Y is finite and ends in a Jj2 ∧  Zj⊕1 state. Finally, the greatest fixpoint 2 νZj is responsible for ensuring that, after visiting Jj2 , we can loop and visit Jj⊕1 and so on. By the cyclic dependence of the outermost greatest fixpoint, either all the sets in Jj2 are visited or getting stuck in some inner greatest fixpoint, where some Ji1 is visited only finitely many times. We include in Fig. 1 a (slightly simplified) code of the implementation of this µcalculus formula in TLV (see Section 5). We denote Jiα for α ∈ {1, 2} by Ji(i, α) and  by cox. We denote conjunction, disjunction, and negation by &, |, and ! respectively. A GreatestFixpoint loop on variable u starts by setting the initial value of u to the set of all states and a LeastFixpoint loop over u starts by setting u to the empty set of states. For both types of fixpoints, the loop terminates if two successive values of u are the same. The greatest fixpoint GreatestFixpoint(x <= z), means that the initial 3 This does not suggest a canonical translation from vector formulas to plain formulas. The same translation works for the formula in Equation (3) below. Note that the formula in Equation (2) and the formula in Equation (3) have a very similar structure. 372 N. Piterman, A. Pnueli, and Y. Sa’ar Func winm(m, n); GreatestFixpoint(z) For (j in 1...n) Let r := 1; LeastFixpoint (y) Let start := Ji(j,2) & cox(z) | cox(y); Let y := 0; For (i in 1...m) GreatestFixpoint (x <= z) Let x := start | !Ji(i,1) & cox(x); End -- GreatestFixpoint (x) Let x[j][r][i] := x; // store values of x Let y := y | x; End -- For (i in 1...m) Let y[j][r] := y; // store values of y Let r := r + 1; End -- LeastFixpoint (y) Let z := y; Let maxr[j] := r - 1; End -- For (j in 1...m) End -- GreatestFixpoint (z) Return z; End -- Func winm(m, n); Fig. 1. TLV implementation of Equation (3) value of x is z instead of the universal set of all states. We use the sets y[j][r] and their subsets x[j][r][i] to define n strategies for the system. The strategy fj is defined on the states in Zj . We show that the strategy fj either forces the play to visit Jj2 and then proceed to Zj⊕1 , or eventually avoid some Ji1 . We show that by combining these strategies, either the system switches strategies infinitely many times and ensures that the play be winning according to right hand side of the implication or eventually uses a fixed strategy ensuring that the play does not satisfy the left hand side of the implication. Essentially, the strategies are “go to y[j][r] for minimal r” until getting to a Jj2 state and then switch to strategy j ⊕ 1 or “stay in x[j][r][i]”. It follows that we can solve realizability of LTL formulas in the form that interests us in polynomial (cubic) time. Theorem 1. [KPP05] Given sets of variables X , Y whose set of possible valuations is Σ and an LTL formula ϕ with m and n conjuncts, we can determine using a symbolic algorithm whether ϕ is realizable in time proportional to (nm|Σ|)3 . 4 Synthesis We show how to use the intermediate values in the computation of the fixpoint to produce an FDS that implements ϕ. The FDS basically follows the strategies explained above. Let X , Y, and ϕ be as above. Let G: V, X , Y, ρe , ρs , Θ, ϕg be the GS defined by X , Y, and ϕ (where V = X ∪ Y). We construct the following fairness-free FDS. Let Synthesis of Reactive(1) Designs 373 D : VD , X , YD , ΘD , ρ where VD = V ∪ {jx} and jx ranges over [1..n], YD = Y ∪ {jx}, ΘD = Θ ∧ (jx = 1). The variable jx is used to store internally which strategy should be applied. The transition ρ is ρ1 ∨ ρ2 ∨ ρ3 where ρ1 , ρ2 , and ρ3 are defined as follows. Transition ρ1 is the transition taken when a Jj2 state is reached and we change strategy from fj to fj⊕1 . Accordingly, all the disjuncts in ρ1 change jx. Transition ρ2 is the transition taken in the case that we can get closer to a Jj2 state. These transitions go from states in some set y[j][r] to states in the set y[j][r ] where r < r. We take care to apply this transition only to states s for which r > 1 is the minimal index such that s ∈ y[j][r]. Transition ρ3 is the transition taken from states s ∈ x[j][r][i] such that s |= ¬Ji1 and the transition takes us back to states in x[j][r][i]. Repeating such a transition forever will also lead to a legitimate computation because it violates the environment requirement of infinitely many visits to Ji1 -states. Again, we take care to apply this transition only to states for which (r, i) are the (lexicographically) minimal indices such that s ∈ x[j][r][i].  Let y[j][< r] denote the set l∈[1..r−1] y[j][l]. We write (r , i ) ≺ (r, i) to denote r < r that the pair (r , i ) is lexicographically smaller than the pair (r, i). That is, either     or r = r and i < i. Let x[j][≺(r, i)] denote the set (r ,i )≺(r,i) x[j][r ][i ]. The transitions are defined as follows.  = (jx=j) ∧ z ∧ Jj2 ∧ ρe ∧ ρs ∧ z  ∧ (jx =j⊕1) ρ1  j∈[1..n] ρ2 (j) = ρ2 = y[j][r] ∧ ¬y[j][< r] ∧ ρe ∧ ρs ∧ y  [j][< r] r>1   (jx=jx =j) ∧ ρ2 (j)  j∈[1..n] ρ3 (j) =  x[j][r][i] ∧ ¬x[j][≺(r, i)] ∧ ¬Ji1 ∧ ρe ∧ ρs ∧ x [j][r][i] r i∈[1..m] ρ3 = (jx=jx =j) ∧ ρ3 (j) j∈[1..n] The conjuncts ¬y[j][< r] and ¬x[j][≺(r, i)] appearing in transitions ρ2 (j) and ρ3 (j) ensure the minimality of the indices to which these transitions are respectively applied. Notice that the above transitions can be computed symbolically. We include below the TLV code that symbolically constructs the transition relation of the synthesized FDS and places it in trans. We denote the conjunction of ρe and ρs by trans12. To symb_strategy; Let trans := 0; For (j in 1...n) Let jp1 := (j mod n) + 1; Let trans := trans | (jx=j) & z & Ji(j,2) & trans12 & next(z) & (next(jx)=jp1); End -- For (j in 1...n) For (j in 1...n) Let low := y[j][1]; 374 N. Piterman, A. Pnueli, and Y. Sa’ar For (r in 2...maxr[j]) Let trans := trans | (jx=j) & y[j][r] & !low & trans12 & next(low) & (next(jx)=j); Let low := low | y[j][r]; End -- For (r in 2...maxr[j]) End -- For (j in 1...n) For (j in 1...n) Let low := 0; For (r in 2...maxr[j]) For (i in 1...m) Let trans := trans | (jx=j) & x[j][r][i] & !low & !ji(i,1) & trans12 & next(x[j][r][i]) & (next(jx)=j); Let low := low | x[j][r][i]; End -- For (i in 1...m) End -- For (r in 2...maxr[j]) End -- For (j in 1...n) End -- To symb_strategy; 4.1 Minimizing the Strategy We have created an FDS that implements an LTL goal ϕ. The set of variables of this FDS includes the given set of input and output variables as well as a ‘memory’ variable jx. We have quite a liberal policy of choosing the next successor in the case of a visit to Jj2 . We simply choose some successor in the winning set. Here we minimize (symbolically) the resulting FDS. A necessary condition for the soundness of this minimization is that the specification be insensitive to stuttering4 Notice, that our FDS is deterministic. For every state and every possible assignment to the variables in X ∪ Y there exists at most one successor state with this assignment. Thus, removing transitions seems to be of lesser importance. We concentrate on removing redundant states. As we are using the given sets of variables X and Y the only possible candidate states for merging are states that agree on the values of variables in X ∪ Y and disagree on the value of jx. If we find two states s and s such that ρ(s, s ), s[X ∪Y] = s [X ∪Y], and s [jx] = s[jx]⊕1, we remove state s. We direct all its incoming arrows to s and remove its outgoing arrows. Intuitively, we can do that because for every computation that passes through s there exists a computation that stutters once in s (due to the assumption of stuttering insensitivity). This modified computation passes from s to s and still satisfies all the requirements (we know that stuttering in s is allowed because there exists a transition to s which agrees with s on all variables). As mentioned this minimization is performed symbolically. As we discuss in Section 5, it turns out that the minimization actually increases the size of the resulting BDDs. 4 A specification is insensitive to stuttering if the result of doubling a letter (or replacing a double occurrence by a single occurrence) in a model is still a model. The specifications we consider are allowed to use the next operator, thus they can be sensitive to stuttering. A specification that requires that in some case an immediate response be made would be sensitive to stuttering. Synthesis of Reactive(1) Designs 375 It seems to us that for practical reasons we may want to keep the size of BDDs minimal rather than minimize the automaton. The symbolic implementation of the minimization is given below. The transition obseq includes all possible assignments to V and V  such that all variables except jx maintain their values. It is enough to consider the transitions from j to j⊕1 for all j and then from n to j for all j to remove all redundant states. This is because the original transition just allows to increase jx by one. For (j in 1..n) Let nextj := (j mod n)+1; reduce(j,nextj); End -- For (j in 1..n) For (j in 1..n-1) reduce(n,j) End -- For (j in 1..n-1) Func reduce(j,k) Let idle := trans & obseq & jx=j & next(jx)=k; Let states := idle forsome next(V); Let add_trans := ((trans & next(states) & next(jx)=j) forsome jx) & next(jx)=k; Let rem_trans := next(states) & next(jx)=j1 | states & jx=j1; Let add_init := ((init & states & jx=j1) forsome jx) & jx=k; Let rem_init := states & jx=j; Let trans := (trans & !rem_trans) | add_trans; Let init := (init & !rem_init) | add_init; Return; End -- Func reduce(j,k) 5 Experimental Results The algorithm described in this paper was implemented within the TLV system [PS96]. TLV is a flexible verification tool implemented at the Weizmann Institute of Science. TLV provides a programming environment which uses BDD s as its basic data type [Bry86]. Deductive and algorithmic verification methods are implemented as procedures written within this environment. We extended TLV’s functionality by implementing the algorithms in this paper. We consider two examples. The case of an arbiter and the case of a lift controller. 5.1 Arbiter We consider the case of an arbiter. Our arbiter has n input lines in which clients request permissions and n output lines in which the clients are granted permission. We 376 N. Piterman, A. Pnueli, and Y. Sa’ar assume that initially the requests are set to zero, once a request has been made it cannot be withdrawn, and that the clients are fair, that is once a grant to a certain client has been given it eventually releases the resource by lowering its request line. Formally, the assumption on the environment in LTL format is below.  (ri ∧ ((ri =gi ) → (ri =  ri )) ∧ ((ri ∧ gi ) →  ri )) i We expect the arbiter to initially give no grants, give at most one grant at a time (mutual exclusion), give only requested grants, maintain a grant as long as it is requested, to satisfy (eventually) every request, and to take grants that are no longer needed. Formally, the requirement from the system in LTL format is below. ⎛ ⎫⎞ ⎧ ((ri =gi ) → (gi =  gi )) ∧ ⎪ ⎪   ⎪⎠ ⎪ ⎪ ⎪ ((ri ∧ gi ) →  gi ) ∧⎪ ¬(gi ∧ gj ) ∧ ⎝gi ∧ ⎪ ⎪ ⎪ ⎭ ⎩ i ((ri ∧ gi ) →  gi ) i=j The resulting game is G: V, X , Y, ρe , ρs , ϕ where – – – – – – X = {ri | i = 1, . . . , n} Y = {g  i | i = 1, . . . , n} Θ = i (ri ∧ gi ) ρe = i ((ri =gi ) → (ri =r  i )) ρs = i=j ¬(gi ∧ gj ) ∧ i ((ri =gi ) → (gi = gi )) ϕ = i ((ri ∧gi ) →  ri ) → i ((ri ∧gi ) →  gi )∧ ((ri ∧gi ) →  gi ) We simplify ϕ by replacing ((ri ∧ gi ) →  ri ) by  ¬(ri ∧ gi ) and replacing ((ri ∧ gi ) →  gi ) and ((ri ∧ gi ) →  gi ) by  (ri =gi ). The first simplification is allowed because whenever ri ∧gi holds, the next value of gi is true. The second simplification is allowed because whenever ri ∧ gi or ri ∧ gi holds, the next value of ri is equal to the current. This results with the simpler goal:    ¬(ri ∧ gi ) →  (ri =gi ) ϕ= i i In Fig. 2, we present graphs of the run time and size of resulting implementations for the Arbiter example. Implementation sizes are measured in number of BDD nodes. In Fig. 3 we include the explicit representation of the arbiter for two clients resulting from the application of our algorithm. 5.2 Lift Controller We consider the case of a lift controller. We build a lift controller for n floors. We assume n button sensors. The lift may be requested on every floor, once the lift has been called on some floor the request cannot be withdrawn. Initially, on all floors there are no requests. Once a request has been fulfilled it is removed. Formally, the assumption on the environment in LTL format is below.     (bi ∧ fi ) →  bi ∧ ((bi ∧ ¬fi ) →  bi ) bi ∧ i Synthesis of Reactive(1) Designs 300 Program Size Execution Time 125 250 100 200 75 150 50 100 25 50 0 10 20 30 40 50 60 70 80 Time (seconds) Program size (×103 ) 150 377 90 Number of clients Fig. 2. Running times and program size for the Arbiter example r1 r2 ; g1 g2 r1 r2 ; g1 g2 r1 r2 ; g1 g2 r1 r2 ; g1 g2 r1 r2 ; g1 g2 r1 r2 ; g1 g2 r1 r2 ; g1 g2 r1 r2 ; g1 g2 r1 r2 ; g1 g2 Fig. 3. Arbiter for 2 We expect the lift to initially start on the first floor. We model the location of the lift by an n bit array. Thus we have to demand mutual exclusion on this array. The lift can move at most one floor at a time, and eventually satisfy every request. Formally, the requirement from the system in LTL format is below.  (up → sb) ∧  (f1 ∨ sb) ∧ i=j ¬(fi ∧ fj ) ∧  ((i = 1 ∧ f ∨ i = 1 ∧ ¬f ) ∧  (b → f ) ∧ (f →  (f ∨ f ∨ f ))) i i i i i i i−1 i+1 i   where up = i (fi ∧  fi+1 ) denotes that the lift moves one floor up, and sb = i bi denotes that at least one button is pressed. The requirement (up → sb) states that the lift should not move up unless some button is pressed. The liveness requirement  (f1 ∨ sb) states that either some button is pressed infinitely many times, or the lift parks at floor f1 infinitely many times. Together they imply that when there is no active request, the lift should move down and park at floor f1 . In Fig. 4 we present graphs of the run time and the size of the resulting implementations for different number of floors. As before, implementation sizes are measured in number of BDD nodes. N. Piterman, A. Pnueli, and Y. Sa’ar 120 Program size (×103) 300 Program Size Execution Time 250 100 200 80 150 60 100 40 50 20 0 10 20 30 40 50 Number of floors 60 70 80 Time (seconds) 378 90 Fig. 4. Running times and program size for the Lift example 6 Extensions The class of specifications to which the N 3 -synthesis algorithm is applicable is wider than the limited form presented m in Equation n (1). The algorithm can be applied to any specification of the form ( i=1 ϕi ) → ( j=1 ψj ), where each ϕi and ψj can be speci q for a past formula q. Equivalently, each ϕi fied by an LTL formula of the form and ψj should be specifiable by a deterministic Büchi automaton. This is, for example, the case of the original version of the Arbiter, where the liveness conjuncts were each a response formula of the form (p →  q). The way we deal with such a formula is to add to the game additional variables and a transition relation which encodes the deterministic Büchi automaton. For example, to deal with a formula (p →  q), we add to the game variables a new Boolean variable x with initial condition x = 1, and add to the transition relation ρe the additional conjunct x = (q ∨ x ∧ ¬p) Table 1. Experiments for Arbiter N Recurrence Properties Response Properties 4 0.05 0.33 6 0.06 0.89 8 0.13 1.77 10 0.25 3.04 12 0.48 4.92 14 0.87 7.30 16 1.16 10.57 18 1.51 15.05 20 1.89 20.70 25 3.03 43.69 30 4.64 88.19 35 6.78 170.50 40 9.50 317.33 Synthesis of Reactive(1) Designs 379 We replace in the specification the sub-formula (p →  q) by the conjunct  x. It is not difficult to see that this is a sound transformation. That is, the formula (p →  q) is satisfied by a sequence σ iff there exists an interpretation of the variable x which satisfies the added transition relation and also equals 1 infinitely many times. Indeed, the in Table 1 we present the performance results of running the Arbiter example with the original specification, to which we applied the above transformation from response to recurrence formulas. The first column presents the results, when the liveness requirements are given as the recurrence formulas  (ri = gi ). In the second column, we present the results for the case that we started with the original requirements (ri →  )gi , and then transformed them into recurrence formulas according to the recipe presented above. 7 Conclusions We presented an algorithm that solves realizability and synthesis for a subset of LTL. For this subset the algorithm works in cubic time. We also presented an algorithm which reduces the number of states in the synthesized module for the case that the specification is stuttering insensitive. We have shown that the approach can be applied to wide class of formulas, which covers the full set of generalized reactivity[1] properties. We expect both the system and the environment to be realized by hardware designs. Thus, the temporal semantics of both the system and the environment have a specific form and the implication between the two falls in the set of formulas that we handle. Generalized reactivity[1] certainly covers all the specifications we have so far considered in the Prosyd project. We believe that modifications similar to the ones described in Section 6 would be enough to allow coverage of specifications given in languages such as PSL or FORSPEC [AO04, AFF+ 02]. Acknowledgments We thank P. Madhusudan for suggesting that enumerating the states of the controller may be very inefficient. References [AFF+ 02] [AMPS98] [AO04] R. Armoni, L. Fix, A. Flaisher, R. Gerth, B. Ginsburg, T. Kanza, A. Landver, S. Mador-Haim, E. Singerman, A. Tiemeyer, M. Vardi, and Y. Zbar. The ForSpec temporal logic: A new temporal property-specification language. In 8th TACAS, LNCS 2280, 2002. E. Asarin, O. Maler, A. Pnueli, and J. Sifakis. Controller synthesis for timed automata. In IFAC Symposium on System Structure and Control, pages 469–474. Elsevier, 1998. Inc. Accellera Organization. Formal semantics of Accellera(c) property specification language. Appendix B of http://www.eda.org/vfv/docs/PSL-v1.1.pdf, January 2004. 380 [AT04] [BL69] [Bry86] [CE81] [Chu63] [EL86] [Eme97] [Koz83] [KP00] [KPP05] [Lic91] [MW84] [PR89a] [PR89b] [PR90] [PS96] [Rab72] N. Piterman, A. Pnueli, and Y. Sa’ar R. Alur and S. La Torre. Deterministic generators and games for LTL fragments. ACM Trans. Comput. Log., 5(1):1–25, 2004. J.R. Büchi and L.H. Landweber. Solving sequential conditions by finite-state strategies. Trans. Amer. Math. Soc., 138:295–311, 1969. R.E. Bryant. Graph-based algorithms for Boolean function manipulation. IEEE Transactions on Computers, C-35(12):1035–1044, 1986. E.M. Clarke and E.A. Emerson. Design and synthesis of synchronization skeletons using branching time temporal logic. In Proc. IBM Workshop on Logics of Programs, volume 131 of Lect. Notes in Comp. Sci., pages 52–71. Springer-Verlag, 1981. A. Church. Logic, arithmetic and automata. In Proc. 1962 Int. Congr. Math., pages 23–25. E. A. Emerson and C. L. Lei. Efficient model-checking in fragments of the propositional modal µ-calculus. In Proc. First IEEE Symp. Logic in Comp. Sci., pages 267–278, 1986. E.A. Emerson. Model checking and the µ-calculus. In N. Immerman and Ph.G. Kolaitis, editors, Descriptive Complexity and Finite Models, pages 185–214. AMS, 1997. D. Kozen. Results on the propositional µ-calculus. Theoretical Computer Science, 27:333–354, 1983. Y. Kesten and A. Pnueli. Verification by augmented finitary abstraction. Inf. and Comp., 163:203–243, 2000. Y. Kesten, N. Piterman, and A. Pnueli. Bridging the gap between fair simulation and trace inclusion. Inf. and Comp., 200(1):36–61, 2005. O. Lichtenstein. Decidability, Completeness, and Extensions of Linear Time Temporal Logic. PhD thesis, Weizmann Institute of Science, 1991. Z. Manna and P. Wolper. Synthesis of communicating processes from temporal logic specifications. ACM Trans. Prog. Lang. Sys., 6:68–93, 1984. A. Pnueli and R. Rosner. On the synthesis of a reactive module. In Proc. 16th ACM Symp. Princ. of Prog. Lang., pages 179–190, 1989. A. Pnueli and R. Rosner. On the synthesis of an asynchronous reactive module. In Proc. 16th Int. Colloq. Aut. Lang. Prog., volume 372 of Lect. Notes in Comp. Sci., pages 652–671. Springer-Verlag, 1989. A. Pnueli and R. Rosner. Distributed reactive systems are hard to synthesize. In Proc. 31st IEEE Symp. Found. of Comp. Sci., pages 746–757, 1990. A. Pnueli and E. Shahar. A platform for combining deductive with algorithmic verification. In Proc. 8th Intl. Conference on Computer Aided Verification (CAV’96), volume 1102 of Lect. Notes in Comp. Sci., Springer-Verlag, pages 184–195, 1996. M.O. Rabin. Automata on Infinite Objects and Churc’s Problem, volume 13 of Regional Conference Series in Mathematics. Amer. Math. Soc., 1972. Systematic Construction of Abstractions for Model-Checking Arie Gurfinkel, Ou Wei, and Marsha Chechik Department of Computer Science, University of Toronto {arie, owei, chechik}@cs.toronto.edu Abstract. This paper describes a framework, based on Abstract Interpretation, for creating abstractions for model-checking. Specifically, we study how to abstract models of µ-calculus and systematically derive abstractions that are constructive, sound, and precise, and apply them to abstracting Kripke structures. The overall approach is based on the use of bilattices to represent partial and inconsistent information. 1 Introduction Abstraction plays a fundamental role in combating state-space explosion in modelchecking. The goal of abstraction is to construct an abstract model of a system which is small enough to be effectively analyzed, and yet rich enough to yield conclusive results. Success of current abstraction projects, such as SLAM [2] and Bandera [6], indicates that abstraction is an effective technique for enabling model-checking of realistic software systems. In model-checking, a notion of abstracting a transition system is typically developed as follows: (1) An abstract statespace is defined such that each abstract state corresponds to a set of concrete states. This correspondence can be arbitrary, as in predicate abstraction [17], or influenced by the concrete statespace, as in symmetry reduction [12]. (2) An abstract transition system is constructed by defining a transition relation over this abstract statespace. (3) Finally, the resulting system is argued to be sound, i.e., it is shown to preserve a fragment of the desired temporal logic. The problem with the above approach is that it is not algorithmic: the techniques used to construct the abstract systems require a certain amount of intuition of users, and extra effort is needed to show that the resulting abstraction is correct. This makes it difficult to understand a specific abstraction method and improve on it. For example, given an abstraction that preserves universal CTL, how should it be changed to preserve the entire CTL? It is also difficult to understand the relationship between different abstract methods. For example, as shown in [25], predicate abstraction and symmetry reduction differ only in their choice of abstract states. However, this insight was not apparent just from the description of these methods. Given the role abstraction plays in the model-checking process, we believe it is essential to create a general methodology for systematically constructing and analyzing abstractions. In the context of static analysis of programs, such a framework, called Abstract Interpretation (AI), has already been proposed by [7]. It provides a collection of E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 381–397, 2006. c Springer-Verlag Berlin Heidelberg 2006  382 A. Gurfinkel, O. Wei, and M. Chechik notations and tools to formalize the approximation of program semantics, as well as to design and analyze program abstractions. The goal of this paper is to specialize the AI framework to model-checking. There are a number of ways to do this specialization, given the breadth of modelchecking approaches. Our goal here is to create abstractions that preserve properties expressed in the modal µ-calculus [20] (Lµ ). Following the recipes of AI, we systematically derive conditions under which an abstract Lµ model is the best abstraction of a concrete one. We guarantee that these abstract models are (a) sound, i.e., if an Lµ formula is satisfied in the abstract model it is satisfied in the concrete, (b) most precise, i.e., satisfy the most properties, and (c) have the desired structural characteristics, e.g., a requirement that an abstraction of a transition system is a transition system as well. These conditions are constructive and, as we show in this paper, can be derived almost mechanically. The algorithm for building a desired abstraction follows from these conditions directly. The logic Lµ includes negation, so that an Lµ formula ¬ϕ is satisfied iff ϕ is refuted. If we assume that every formula is either satisfied or refuted in an abstraction as well, it may seem that preserving soundness for all Lµ formulas means that such an abstraction must satisfy and refute exactly the same properties as the corresponding concrete model (resulting in a bisimilar model). If the goal is to save space for model-checking, this abstraction would be very limited. Thus, most existing abstractions are restricted to fragments of Lµ , i.e., only to the universal or only to the existential properties (see, e.g., [22]). The insight we use in this paper is that an abstraction is inherently incomplete: some formulas may be neither satisfied nor refuted by it. We propose to treat satisfaction and refutation independently. If we classify all Lµ formulas using a pair Sat, Ref, where Sat contains all the formulas satisfied in an abstract model, while Ref contains all the refuted ones, then Sat and Ref are not necessarily complements of each other. In fact, Sat and Ref may not even be disjoint, allowing some formulas to be both satisfied and refuted. Associating knowledge about truth and falsity of every piece of evidence can be naturally encoded using 4-valued Belnap logic [3] which enjoys nice mathematical properties associated with bilattices [15, 14]. That is, bilattices enable a uniform approach for handling partial and inconsistent information, allowing reasoning about truth and knowledge in a single theoretical framework. In this paper, by combining the theories of AI with that of bilattices, we obtain a simple and elegant framework for deriving abstractions for Lµ . Due to the generality of bilattices, our results apply not only to the traditional two-valued interpretation of Lµ , but also to its multi-valued [4] and quantitative [10] interpretations. The contribution of this paper is a general technique, based on AI, for deriving abstractions for model-checking. It allows understanding and comparing different techniques, and provides a methodology for proving soundness and precision of the desired abstraction. We then study this technique on two additional levels. First, we apply it to Lµ models, and then specialize it to abstracting transition systems represented as Kripke structures. Systematic Construction of Abstractions for Model-Checking 383 The rest of this paper is organized as follows: after providing the necessary background in Section 2, we show how to lift abstraction between elements to abstraction between sets of elements in Section 3. This gives us a general framework for approximating interpretations of Lµ . In Section 4, we derive abstractions for model-theoretic interpretations of Lµ , and then apply our technique to abstracting transition systems in Section 5. In Section 6, we specialize the results of Section 5 to boolean transition systems, and compare them to those obtained by Dams et al. [8]. We relate our technique with other abstraction approaches in Section 7 and summarize our contributions in Section 8. 2 Background In this section, we introduce the basic concepts of lattice theory, modal µ-calculus, and abstract interpretation. 2.1 Lattices and Monotone Functions A lattice is a partially ordered set L = (L, ≤) in which every subset B of L has a least upper bound, called join and denoted B, and a greatest lower bound, called meet and denoted B [9]. A lattice is distributive if meet and join distribute over each other, i.e., (a  b)  c = (a  c)  (b  c), and (a  b)  c = (a  c)  (b  c). A De Morgan algebra is a structure D = (L, ≤, −), where (L, ≤) is a distributive lattice and − : L → L is a negation that satisfies involution (−−a = a) and De Morgan laws: −(a  b) = −a  −b, and −(a  b) = −a  −b. We denote the space of functions from A to B by A → B, or B A . For example, both A → [B → C] and (C B )A denote the space of functions from A to functions from B to C. Let A be a set and L = (L, ≤) be a lattice. The ordering and operations of L extend pointwise to LA , i.e., f ≤ g ⇔ ∀a ∈ A · f (a) ≤ g(a). This turns LA into a lattice with the same properties as L. In particular, if L is distributive or De Morgan, so is LA . A function f between two partially ordered sets (A, ≤) and (B, ) is monotone (or, order-preserving) iff a ≤ b ⇒ f (a) f (b), and anti-monotone iff a ≤ b ⇒ f (b) f (a). We use upward (↑) and downward (↓) arrows to indicate monotone and anti-monotone functions, respectively. For example, [A → B]↑ denotes the space of all monotone functions from A to B, and (B A )↓ denotes the space of all anti-monotone functions. Monotone and anti-monotone functions are closed under pointwise meet and join; thus, if B is a lattice, then so are [A → B]↑ and [A → B]↓ . 2.2 Truth Domains and Sets A truth-domain D is a collection of elements D, referred to as truth values, together with a truth ordering and a negation operator ¬ : D → D, such that D = (D, , ¬) is a De Morgan algebra. The truth ordering orders the elements based on their truth content; thus, a b stands for “a is less true than b”. The meet (∧) and join (∨) of the truth ordering are called conjunction and disjunction, respectively. 384 A. Gurfinkel, O. Wei, and M. Chechik true (¬ false) t (¬m) (a) false (¬ true) (¬f) m (b) d f pos& odd pos& evn neg& odd neg& evn 0.5 (¬ 0.5) pos odd evn neg 0 (d) 1 (¬d) (¬t) (c) (¬ 0) (¬ 1) int Fig. 1. (a)-(c) Truth domains: (a) 2-valued boolean logic, (b) Belnap logic, and (c) Fuzzy logic. (d) An abstract domain for Z. The best known truth domain is the classical boolean logic 2 with values true and false. Its truth ordering is shown in the Hasse diagram in Figure 1(a), with negation indicated in parentheses. Other examples include Belnap logic B, shown in Figure 1(b), which extends boolean logic with two additional values: m and d, to represent “unknown” and “inconsistent”, respectively; and Fuzzy logic F , shown in Figure 1(c). The truth values of F are formed by the set of all real numbers in the closed interval [0, 1], where 0 stands for false, 1 for true, and the remaining values stand for degrees of truth; furthermore, negation is defined as ¬x  1 − x, so that ¬0 = 1 and ¬1 = 0. Given a collection of elements U , a set over U is a function from U to a truth domain. Thus, a boolean (or classical) set is a function from U to 2, a Belnap set is a function from U to B, and a fuzzy set is a function from U to F . Set ordering and operations are defined by pointwise extensions. Let S1 , S2 : U → D be two sets. Then S1 ⊆ S2  ∀x · S1 (x)  S2 (x) S1  λx · ¬S1 (x) S1 ∪ S2  λx · S1 (x) ∨ S2 (x) S1 ∩ S2  λx · S1 (x) ∧ S2 (x). 2.3 Modal µ-Calculus In this section, we describe the modal µ-calculus [20], or Lµ . Definition 1. Let Var be a set of variables and AP be a set of atomic propositions. The logic Lµ (AP ) is the set of all formulas satisfying the grammar ϕ ::= p | z | ¬ϕ | ϕ ∧ ϕ | ♦ϕ | µz · ϕ(z), where p ∈ AP , and z ∈ Var. Furthermore, z in µz · ϕ(z) must occur under the scope of an even number of negations in ϕ(z). Additionally, we define the following syntactic abbreviations: ϕ ∨ ψ  ¬(¬ϕ ∧ ¬ψ) ϕ ⇒ ψ  ¬ϕ ∨ ψ ϕ  ¬♦¬ϕ νZ · ϕ(Z)  ¬µZ · ¬ϕ(¬Z) The modal operator ♦ is typically interpreted as “an existence of an immediate future”. For example, “p” means that p holds now, “♦p” means that there exists an immediate future where p holds, and “p” means that p holds in all immediate futures. The quantifiers µ and ν stand for least and greatest fixpoint, respectively. An occurrence of a variable z in a formula ϕ is bound if it appears in the scope of a µ quantifier and is free otherwise. For example, z is free in p ∨ ♦z, and is bound in µz · p ∨ ♦z. A formula ϕ is closed if it does not contain any free variables. Systematic Construction of Abstractions for Model-Checking 385 A set-based interpretation of Lµ over a set domain DC is a mapping || · || from closed Lµ formulas to D-sets over C. The elements of C are often called states, and ||ϕ||(c) = v is read as “the degree to which ϕ is satisfied by (or, true in) a state c is v”. An Lµ model is a structure M = (DC , (pM )p∈AP , ♦M ), where DC is a set domain; for each p ∈ AP , pM is in DC ; and ♦M : DC → DC is a ⊆-monotone function. The set domain is called the universe of M, and pM and ♦M are interpretations of atomic propositions and the ♦ operator, respectively. A model M gives rise to an Lµ interpretation || · ||M . The interpretation ||ϕ||M σ is defined inductively on the structure of the formula ϕ, where σ : Var → DC is an object assignment for free variables: ||p||M  pM σ M M ||ϕ ∧ ψ||σ  ||ϕ||M σ ∩ ||ψ||σ  M ⊆ ||µx · ϕ||σ  lfp λS · ||ϕ||M σ[x→S] ||z||M  σ(z) σ M ||¬ϕ||M σ  ||ϕ||σ M M ||♦ϕ||M  ♦ (||ϕ|| σ σ ) M where lfp⊆ f is the ⊆-least fixpoint of f . For a closed Lµ formula ϕ, ||ϕ||M σ = ||ϕ||σ  M for any σ and σ . Thus, we write ||ϕ|| for that value, and define it to be the modelbased interpretation of ϕ. Formulas of Lµ are often interpreted over Kripke structures. A Kripke structure is a tuple K = (AP, C, D, I, R), where AP is a collection of atomic propositions, C is a collection of elements (called states), D is a truth domain, I : AP → DC is a mapping from atomic propositions to sets over C, and R : C → DC is a transition function mapping each state to a set of its successors. For a transition function R, we define a ˜ as: pre-image operator pre[R] : DC → DC and its dual pre[R] pre[R](Q)(s)  ∨t∈C (R(s) ∩ Q)(t) pre[R](Q) ˜  pre[R](Q) Intuitively, pre[R](Q)(s) is a degree to which the set R(s) of successors of s has a non-empty intersection with Q. A Kripke structure K = (AP, C, D, I, R) gives rise to an Lµ (AP ) model M(K) = (DC , (pM(K) )p∈AP , ♦M(K) ), where pM(K)  I(p), and ♦M(K)  pre[R]. Finally, the interpretation || · ||K of Lµ in K is defined as ||ϕ||K  ||ϕ||M(K) . 2.4 Abstract Interpretation The framework of Abstract Interpretation (AI) provides a collection of tools for systematic design and analysis of semantic approximations [7]. The framework is very flexible and can be applied in various ways. Below, we give a brief overview of AI, summarizing the results used in our work. Basics of Abstract Interpretation. Inputs to an AI framework are collections of concrete elements C and abstract elements A, called a concrete and an abstract domain, respectively. A notion of approximation, or abstraction, is formalized by a soundness relation ρ ⊆ A × C, where a ρ c is read as “a ρ-approximates c”. A concretization function γ : A → 2C maps each abstract element to a set of concrete elements corresponding to it: γ(a)  {c | a ρ c}. An abstract element a is called consistent if γ(a) = ∅; otherwise, we say a is inconsistent. The elements of A can be 386 A. Gurfinkel, O. Wei, and M. Chechik thought of as properties, such as “positive” or “odd”, and γ(a) as a collection of concrete elements satisfying a. The concretization γ induces an approximation ordering ρ on A such that a ρ b ⇔ γ(a) ⊇ γ(b). Intuitively, a ρ b means that a approximates more concrete elements than b; therefore, a is less informative, or equivalently, less precise than b. When viewed as a property, a is weaker than b. For example, knowing that an element is “positive” is less informative than knowing that it is both “positive” and “odd”. In this paper, an abstract domain A is equipped with an information ordering A such that (A, A ) is a lattice and a A b ⇒ a ρ b. Thus, we can study properties of an abstract domain independently of any particular soundness relation. Furthermore, we assume that A satisfies “the existence of a best approximation” [7], that is: ∀c ∈ C · ∃a ∈ A · (a ρ c ∧ ∀a ∈ A · a ρ c ⇒ γ(a ) ⊇ γ(a)) and use α : C → A to denote an abstraction function that maps each concrete element to its best approximation. Note that for a given c, A may have several best approximations; thus, α is not uniquely defined. In such cases, it is convenient to use the A -largest α, so that ρ and γ can be expressed as a ρ c ⇔ a A α(c) and c ∈ γ(a) ⇔ a A α(c), respectively. A lower bound with respect to ρ is called widening and is denoted by . Intuitively, for a set Q ⊆ A, Q is an abstract element representing the information common to all elements of Q, i.e., γ(Q) ⊇ ∪q∈Q γ(q). In particular, the greatest lower bound A of A is a widening. A widening  is info-preserving if for any Q containing no inconsistent elements, Q is the best representation of information common to all elements of Q, i.e., ∀a ∈ A · γ(a) ⊇ ∪q∈Q γ(q) ⇒ γ(a) ⊇ γ(Q). Abstract domains (A1 , 1 ) and (A2 , 2 ) are informationally equivalent if they represent the same degrees of information, that is, ∀a1 ∈ A1 · ∃a2 ∈ A2 · γ1 (a1 ) = γ2 (a2 ) and ∀a2 ∈ A2 · ∃a1 ∈ A1 · γ1 (a1 ) = γ2 (a2 ). For examples in this paper, we use the set of integers Z as a concrete domain, and the domain A, shown in Figure 1(d), as the abstract domain. The soundness relation ρe ⊆ A × Z is self-explanatory, e.g., 2 is ρe -approximated by pos&evn, evn, pos, and int, where pos&evn is its best abstract approximation. Similarly, γe (evn) is the set EVEN of all even numbers, γe (neg) is the set NEG of all negative numbers, γe (neg&evn) is NEG ∩ EVEN, etc. Functional Abstraction. In practice, it is common to synthesize abstractions of complex structures using abstractions of their parts. A particular application is abstraction of functions, or functional abstraction [7]. Let A = A1 → A2 and C = C1 → C2 be collections of abstract and concrete functions, where A1 and A2 are abstract domains approximating C1 and C2 , respectively. A soundness relation ρf ⊆ A × C is functional if g ρf -approximates f iff g preserves soundness of f . Formally, ρf satisfies g ρf f ⇔ ∀a1 ∈ A1 · ∀c1 ∈ γ1 (a1 ) · g(a1 ) ρ2 f (c1 ) (functional soundness) Let  be a widening operator of A2 , and α : C → A be defined as α (f )(a)  c∈γ1 (a) α2 (f (c)) (functional abstraction) Then α (f ) is a ρf -approximation of f , and its precision is determined by the precision of the widening operator used. Systematic Construction of Abstractions for Model-Checking 387 Theorem 1. [7] Let A, C, ρf , and α be as above. If  is info-preserving, then α (f ) is the best ρf -approximation of f . One of the main results of AI is that α preserves fixpoints: Theorem 2. [7] Let (C, C ) be a lattice, f : [C → C]↑ be a monotone function, and (A, A ) be a lattice approximating C via ρC . If the join operator ∨A of A preserves soundness, i.e., (αC (c1 )∨A αC (c2 )) A αC (c1 ∨C c2 ), then the least fixpoint of α (f ) ρC -approximates the least fixpoint of f : lfpA α (f ) ρC lfpC f . Functional Abstraction and Monotone Functions. Let A = [A1 → A2 ] be as above, and assume that A1 and A2 are equipped with information orderings 1 and 2 , respectively. Then the set A↑ = [A1 → A2 ]↑ of -monotone functions is informationally equivalent to A.Furthermore, if  is an info-preserving widening of A2 , then its pointwise extension to functions is also an info-preserving widening of A↑ [25]. Therefore, we always restrict the abstract domain of functional abstraction to -monotone functions. 3 Abstract Sets Sets play the role of basic blocks in the definition of Lµ semantics. In this section, we develop an abstraction of sets that preserves all set operations, including set complement. This abstraction gives us the necessary tools for abstracting Lµ models, which we do in Section 4. But it is independent of Lµ and can be used anywhere abstract sets are required. We assume that C and A are a concrete and an abstract domain, respectively, related by a soundness relation ρe and an abstraction function αe . We aim to lift ρe to a soundness relation ρs between concrete sets, i.e., functions from C into a fixed truth domain D, and abstract sets, i.e., functions from A into some truth domain B (potentially different from D). The goal of ρs is to preserve set membership: that is, if Sα ρs -approximates S, then if a ∈ A ρe -approximates c, Sα (a) must approximate S(c). As always, we also want to know when Sα is a best approximation of a given set S. We view sets as functions, so it is natural to express ρs as a functional abstraction. For this, we must first identify the notion of an abstract truth domain B and settle on the meaning of “approximating truth values”. 3.1 Bilattices as Abstract Truth Domains Intuitively, an abstract truth-domain B is a truth-domain and, therefore, has a truth ordering and a negation. It is also an abstract domain and needs an information ordering. Furthermore, truth operations should not interfere with the information ordering. For example, if a and b are in B and a is less informative than b, then negation of a (¬a) must be less informative than ¬b. A structure that captures our intuition is that of a bilattice, which has been introduced by Ginsberg [15] to enable reasoning with partiality and inconsistency. Here, we briefly describe distributive bilattices. 388 A. Gurfinkel, O. Wei, and M. Chechik Definition 2. [15] A distributive bilattice is a structure B = (B, , , ¬) such that: (1) Bi = (B, ) is a lattice and Bt = (B, , ¬) is a De Morgan algebra; (2) meet () and join () of Bi , and meet (∧) and join (∨) of Bt are monotone with respect to both  and ; (3) all meets and joins distribute over each other; and (4) negation (¬) is -monotone. The ordering  ranks elements of B with respect to information, and ranks them with respect to truth. Operations ∧ and ∨ of Bt are called conjunction and disjunction. In the spirit of AI, we refer to  and  as widening and narrowing, respectively. De Morgan algebras have a natural connection to bilattices. Theorem 3. [15, 14]. Let D = (D, ≤, −) be a De Morgan algebra, and B(D) be a structure (D × D, , , ¬) such that a, b  c, d  a ≤ c ∧ b ≤ d a, b  c, d  a ≤ c ∧ d ≤ b ¬ a, b  b, a Then, B(D) is a distributive bilattice. Furthermore, every distributive bilattice is isomorphic to B(D) for some De Morgan algebra D. For a truth-domain D, an element x, y of B(D) is interpreted as a truth value whose degree of truth is x and degree of falsity is y. For example, B(2) consists of four elements: t, f representing true – maximal degree of truth and minimal degree of falsity, f, t representing false, f, f representing lack of knowledge – minimal degree of both truth and falsity, and t, t representing an inconsistency (or disagreement) – maximal degree of both truth and falsity. It is easy to verify that B(2) is exactly Belnap logic shown in Figure 1(b). For convenience, we introduce projections πt and πf defined as πt (x, y)  x and πf (x, y)  y. Guided by the above intuition, we say that B(D) is an abstract truth-domain corresponding to a truth domain D. Intuitively, x, y ∈ B(D) approximates c ∈ D if x is no more true than c, and y is no more false than c. In particular, c, −c is the best approximation of c. Formally, this is captured by an abstraction function αt (c)  c, −c, and a soundness relation ρt  {(a, c) | a  αt (c)}. It is easy to verify that truth operations of B(D), including negation, preserve soundness. That is, if a1  αt (c1 ) and a2  αt (c2 ), then a1 ∧ a2  αt (c1 ∧ c2 ), a1 ∨ a2  αt (c1 ∨ c2 ), and ¬a1  αt (¬c1 ). Furthermore,  is an info-preserving widening. 3.2 Set Abstraction We now formally define the soundness relation ρs between concrete (C → D) and abstract (A → B(D)) sets as: Sα ρs S  ∀a ∈ A · ∀c ∈ γe (a) · Sα (a)  αt (S(c)) (set soundness) The soundness relation ρs is functional, and the corresponding abstraction function αs follows immediately from Theorem 1: αs (S)(a)  c∈γe (a) αt (S(c)) (set abstraction) Note that αs (S)(a) = x, y means that the elements in γe (a) belong to S with a truth degree of at least x, and to S with a truth degree of at least y. In particular, if S is a boolean set, then αs (S) is a Belnap set; αs (S)(a) is t iff γe (a) is contained in S, f iff γe (a) is contained in S, m iff γe (a) is not contained in either S or S, and d iff γe (a) is contained in both S and S. Systematic Construction of Abstractions for Model-Checking Concrete Lµ Interpretation Lµ → D C || · ||C Section 4 Abstract ρs ρi Lµ → Kripke Structure (a) C, D, I C ,RC  ρK d m A, B(D), I A ,RA  ρT m d pos& evn m t neg& odd ρ♦ ρs d neg& evn || · ||A ρm Lµ C C C A A B(D)A Model D , (pp∈AP ),♦  ↑ , (pp∈AP ),♦  odd m B(D)A ↑ ρs Section 5 389 d m t d pos& odd d evn t m (b) Fig. 2. (a) Abstracting Lµ : the top row summarizes soundness relations for abstracting Lµ interpretations; the middle one – Lµ models, i.e., interpretions of atomic propositions and ♦ relation; the bottom one – Lµ -preserving abstractions of Kripke structures. (b) A fragment of the abstraction αT (R1 ), where R1 (x) = x + 1. For example, an abstraction αs (EVEN) of a boolean set EVEN ∈ 2Z is αs (EVEN)(a)    t f   m if γe (a) ⊆ EVEN if γe (a) ⊆ ODD otherwise Note a difference between an abstract element evn and an abstract set αs (EVEN). The former represents a property of being an even number, and γe (evn) = EVEN is the set of all numbers having this property. On the other hand, αs (EVEN) represents a set that contains all even and no odd numbers; hence, γs (αs (EVEN)) = {EVEN} is a singleton containing the only set satisfying these conditions. A Recall that the set operations of B(D) are pointwise extensions of the corresponding operations of B(D); therefore, they preserve soundness. For example, if Sα ρs approximates S, then Sα ρs -approximates S, etc. Finally, since ρs is functional, following the discussion in Section 2.4, we restrict A the domain of abstract sets to -monotone functions, i.e., to B(D)↑ . Note that abstract set operations preserve -monotonicity and do not interfere with this restriction. This gives us with a abstract domain for sets that (a) preserves all set operations and (b) has an info-preserving widening. We use elements of this abstract domain as basic blocks for designing Lµ -preserving abstractions in the next section. 4 Abstract Interpretation for Modal µ-Calculus In this section, we develop an abstraction of Lµ models that is sound w.r.t. satisfaction and refutation of all Lµ formulas, i.e., if an Lµ formula is satisfied (refuted) by the abstract model, it is satisfied (refuted) by the concrete one. We start by formalizing the notion of Lµ -preserving approximation in the language of AI, and then systematically extend it to the desired abstraction. The top half of the diagram in Figure 2(a) illustrates the structures and relations discussed in this section, where solid lines represent relations between structures, and dashed those between their components. 390 A. Gurfinkel, O. Wei, and M. Chechik We assume that C is a collection of concrete elements, called states, and D is a truth domain. Recall from Section 2.3 that an interpretation of Lµ || · || over a set domain DC maps each closed Lµ formula to a D-set over C, where ||ϕ||(c) is the degree to which a formula ϕ is true in a state c. Let A be an abstract domain approximating C via a soundness relation ρe , and B(D) be an abstract truth domain approximating D via a soundness relation ρt as defined in Section 3.1. Furthermore, let || · ||α be an interpretation of Lµ formulas as B(D)-sets over A. A natural way to extend the soundness relation ρe from states to Lµ interpretations is to say that || · ||α approximates || · || if for every Lµ formula ϕ and every abstract state a ∈ A, ||ϕ||α (a) approximates the degree to which ||ϕ|| is true for every concrete state c corresponding to a. We denote this soundness relation by ρi and formalize it using the set soundness relation ρs , defined in Section 3.2, as || · ||α ρi || · ||  ∀ϕ ∈ Lµ · ||ϕ||α ρs ||ϕ|| (Lµ soundness) In this paper, we are only interested in the model-based interpretations of Lµ . A natural way to extend ρi to models is to say that a concrete model C is approximated by an abstract model A if the corresponding Lµ interpretation || · ||C is approximated by || · ||A . Formally, we define a model soundness relation ρm as A ρm C  || · ||A ρi || · ||C (model soundness) In the rest of this section, we employ the AI framework to construct an abstract model A that is a best ρm -approximation of a given concrete model C. As discussed in Section 3, we restrict the universe of A to -monotone functions from A to B(D). We first outline the steps involved: (1) define a soundness relation ρ♦ between interpretations of the ♦ operator and derive the corresponding abstraction function α♦ ; (2) A A show that an abstract model A = (B(D)A ↑ , (p )p∈AP , ♦ ) ρm -approximates a concrete model C = (DC , (pC )p∈AP , ♦C ) if for each p ∈ AP , pA ρs -approximates pC , and ♦A ρ♦ -approximates ♦C ; (3) conclude that the best approximation of C is given by A αm (C)  (B(D)↑ , (αs (pC ))p∈AP , α♦ (♦C )). Step 1. For a given Lµ -model, an interpretation of modal formulas, i.e. formulas with ♦ but no fixpoint quantifiers, is determined by the model’s interpretation of the ♦ operator. Thus, we define ρ♦ as follows: A C ♦A ρ♦ ♦C  ∀X ∈ B(D)A (♦-soundness) ↑ · ∀Y ∈ γs (X) · ♦ (X) ρs ♦ (Y ) Following Theorem 1, its corresponding abstraction function α♦ is defined as α♦ (♦C )(X)  Y ∈γs (X) αs (♦C (Y )) (♦-abstraction) Step 2. To show that an abstract model A ρm -approximates a concrete model C if each component of A approximates the corresponding counterpart of C, we need to show that for any formula ϕ, ||ϕ||A ρs -approximates ||ϕ||C . Theorem 4. Let C = (DC , (pC )p∈AP , ♦C ) be a concrete model, A = (B(D) , (pA )p∈AP , ♦A ) be an abstract model such that A approximates C via a soundness relation ρe . Then, A ρm C ⇐ ∀p ∈ AP · pA ρs pC ∧ ♦A ρ♦ ♦C . A The theorem is proved by structural induction on ϕ, using Theorem 2 for cases where ϕ contains a fixpoint quantifier. Systematic Construction of Abstractions for Model-Checking 391 Step 3. Finally, we define an abstraction function αm that maps each concrete model to its best abstract approximation: C C αm (C)  (B(D)A (model abstraction) ↑ , (αs (p ))p∈AP , α♦ (♦ )) For example, consider a concrete boolean model C = (2Z , pC , ♦C ), where pC = EVEN and ♦C = λS · {y | y + 1 ∈ S}. Then, ♦p is interpreted in C as ||♦p||C = ♦C (EVEN) = ODD, and in the abstraction of C as ||♦p||αm (C) = α♦ (♦C )(αs (EVEN)) = αs (ODD). The resulting abstraction function αm allows us to abstract Lµ models, obtaining abstractions which are both sound and precise. However, αm depends on an interpretation of ♦ modality, which we left unspecified. We study this subject below. 5 Abstraction of Kripke Structures In practice, the ♦ modality is often interpreted using a Kripke structure. In this section, we are interested in conditions under which a Kripke structure over an abstract statespace (i.e., an abstract Kripke structure) is a best approximation of a given concrete one. We show that the framework of AI provides an elegant and almost mechanical way to answer this question. Approximating Kripke Structures. Below, we aim to extend the soundness relation ρm between models to a soundness relation ρK between Kripke structures, and derive a corresponding abstraction function αK . Throughout this section, we assume that C = (C, D, I C , RC ) is a concrete Kripke structure over concrete states C and a truth domain D, and A = (A, B(D), I A , RA ) is an abstract Kripke structure, where A is an abstract domain related to C via ρe , and B(D) is an abstract truth domain related to D via ρt . The soundness relation ρK on Kripke structures is defined as a restriction of the model soundness relation ρm (see Figure 2(a)): A ρK C  M(A) ρm M(C) (Kripke soundness) By Theorem 4, ρK is decomposed over the components of the Kripke structure: A ρK C ⇐ (∀p ∈ AP · I A (p) ρs I C (p)) ∧ RA ρT RC where the relation ρT between transition functions is defined as: RA ρT RC  pre[RA ] ρ♦ pre[RC ] (transition soundness) The abstraction function αs corresponding to ρs has already been defined in Section 3.2. Thus, the only missing ingredient for defining αK is the transition abstraction αT . Unfortunately, the soundness relation ρT is not functional; making Theorem 1 not applicable. However, we show below that ρT can be easily made functional. We begin by introducing an intersection operator isct: isct(X)(S)  ∨t (X ∩ S)(t) which allows us to express the pre-image of a transition function R as pre[R](Q) = λs · isct(R(s))(Q). We then define a functional soundness relation ρisct (see Figure 3(a)): isct(X) ρisct isct(Y )  ∀S ∈ B(D)A ↑ · ∀Q ∈ γs (S) · isct(X)(S) ρt isct(Y )(Q) Noticing that isct(X) is determined by a set X, we extend ρisct to a soundness relation ρ∩ between sets (see Figure 3(b)): 392 A. Gurfinkel, O. Wei, and M. Chechik ρ♦ DC → DC A B(D)A ↑ → B(D)↑ [D C → D] ρisct pre[RC ] C → [D C → D] A A → [B(D)A ↑ → B(D)]↑ pre[R ] (a) C R =C→D C R A =A→ B(D)A ↓ [B(D)A ↑ → B(D)]↑ isct(X) ρ∩ ρT ρisct isct(Y ) ρ∩ DC (b) B(D)A ↓ Fig. 3. (a) Soundness relations between ♦ modality and transition function; (b) Detail of (a): relations ρisct and ρ∩ X ρ∩ Y  isct(X) ρisct isct(Y ) (successor soundness) Finally, ρT is made functional: RA ρT RC ⇔ pre[RA ] ρ♦ pre[RC ] ⇔ ∀a ∈ A · ∀c ∈ γe (s) · isct(RA (a)) ρisct isct(RC (c)) ⇔ ∀a ∈ A · ∀c ∈ γe (s) · RA (a) ρ∩ RC (c) However, ρ∩ is still not functional! Thus, before applying Theorem 1 to construct αT , we need to construct the abstraction function α∩ directly, i.e., without using Theorem 1. We do so below. Abstraction of Intersection. Intuitively, the ideal abstraction α∩ is such that the diagram in Figure 3(b) commutes. That is, α∩ (X) = Y implies that αisct (isct(X)) = isct(Y ). Note that ρisct is functional, thus the definition of αisct (isct(X)) follows from Theorem 1. Following a standard technique of AI, we proceed to reorganize this deA finition until the emergence of conditions under which Y ∈ B(D) is the best ρ∩ abstraction of X. This derivation is simple but long, and is omitted from the paper. For details, please see full version of this paper [19]. Here, we only show the final result. Theorem 5. Let C and (A, A ) be a concrete and an abstract domain related by ρe , D and B(D) be truth-domains related by ρt , and for X ∈ DA , let α∩ be defined as α∩ (X)(a)  ∨c∈γe (a) X(c), ∧c∈γ̃e (a) ¬X(c) , where γ̃e (a)  {c ∈ C | αe (c) A a} is a dual-conretization function. Then, αisct (isct(X)) = isct(α∩ (X)). To construct αT using Theorem 1, we need an info-preserving widening. The widening  on B(D)A – the pointwise extension of  of B(D) – is not info-preserving in general. Instead, we restrict the abstract domain to the -antimonotone functions, i.e., A A A to B(D)↓ , since (a) B(D)↓ is informationally equivalent to B(D) , and (b) it makes pointwise widening  info-preserving. Note that α∩ (X) is already -antimonotone. Abstraction of Transition Functions. Once the abstraction α∩ is defined, the abstraction of transition functions αT follows from Theorem 1: αT (RC )(a)  c∈γe (a) α∩ (RC (c)) (transition abstraction) Systematic Construction of Abstractions for Model-Checking 393 By expanding α∩ , αT can be alternatively expressed as: πt (αT (RC )(a)(b)) = ∧c∈γe (a) pre[RC ](γe (b))(c) ˜ C ](¬γ̃e (b))(c) πf (αT (RC )(a)(b)) = ∧c∈γe (a) pre[R That is, if RA = αT (RC ), then a transition RA (a)(b) between abstract states a and b is as true as the least degree with which all concrete states in γe (a) have a successor in γe (b), and as false as the least degree with which all successors of states in γe (a) are not in γ̃e (b). Note that so far, we have made no assumptions on the concrete transition function. However, if the concrete transition function RC is boolean, then RA = αT (RC ) is B(2)-valued and satisfies: RA (a)(b) = γe (a) ⊆ pre[RC ](γe (b)), γe (a) ⊆ pre[R ˜ C ](¬γ̃e (b)) For example, let R1 (x)  x + 1. A fragment of its abstraction αT (R1 ) is shown in Figure 2(b), where pos, neg and int are removed for clarity. For any even x, x + 1 is definitely odd, but it maybe positive or negative. Thus, the transition from evn to odd is d, and transitions to pos&odd and to neg&odd are m. Note that the preimage of αT (R1 ) approximates the pre-image of R1 , e.g., pre[αT (R1 )](αs (EVEN)) = αs (ODD). Finally, the best abstract Kripke structure αK (C) of a concrete Kripke structure C = (C, D, I C , RC ) is obtained compositionally: αK (C)  (A, B(D), αs ◦ I C , αT (RC )) (Kripke abstraction) Thus, we were able to systematically derive rules for abstracting Kripke structures by abstract Kripke structures. Note that the diagram in Figure 3(a) does not commute, i.e., α♦ (pre[R]) = pre[αT (R)]. Thus, for a given Kripke structure, its best abstraction by an abstract Lµ model is more precise than its best abstraction by an abstract Kripke structure. For example, let R2 be R2 (x)    2x if x ≥ 5 ∧ x ∈ ODD −x if 0 ≤ x < 5 ∧ x ∈ ODD   −2 otherwise and X  (POS ∩ EVEN) ∪ (NEG ∩ ODD). Then, α♦ (pre[R2 ])(αs (X))(pos&odd) = αs (POS ∩ ODD)(pos&odd) = t, but pre[αT (R2 )](αs (X))(pos&odd) = m. This shows that transition systems are not necessarily the best abstract domain for Lµ preserving abstractions. 6 Application: Abstraction of Classical Kripke Structures In this section, we look at boolean Kripke structures and compare our abstraction to that of Dams et al. [8], which provides an alternative way of computing the best Lµ preserving abstraction of Kripke structures. We begin by addressing minor differences between the two approaches. First, the goal of [8] is to preserve satisfaction of positive Lµ , i.e., a fragment of Lµ with negation 394 A. Gurfinkel, O. Wei, and M. Chechik restricted to atomic propositions. Second, Kripke structures are abstracted by Mixed Transition Systems (MixTSs). Essentially, a MixTS is a Kripke structure with two separate transition relations, RC and RF , called constrained and free, respectively. The interpretation of Lµ over MixTSs is the same as its interpretation over Kripke structures, with the exception that ♦ is interpreted as pre[RC ] and  – as pre[R ˜ F ]. Note that positive Lµ is as expressive as full Lµ : for every Lµ formula ϕ there exists an equivalent positive formula NNF(ϕ), its negation normal form. Thus, an abstraction that preserves positive Lµ easily extends to full Lµ . Furthermore, the next theorem shows that MixTSs are equivalent to B(2)-valued Kripke structures. Theorem 6. Let T be a MixTS with statespace A and transition functions RC and RF , and K be a B(2)-valued Kripke structure with the same statespace, and a transition function RK such that RK (a)(b) = RC (a)(b), ¬RF (a)(b). Then, for any Lµ formula ϕ, ||ϕ||K = ||NNF(ϕ)||T , ||NNF(¬ϕ)||T . Thus, in the case of boolean Kripke structures, the abstraction developed in this paper is equivalent to that of [8]: same structures are used as an abstract domain, and exactly the same Lµ formulas are preserved. However, unlike the approach taken in [8], our work systematically derives both the abstraction and the notion of abstract Kripke structures from Lµ -preservation and the soundness relation ρs between concrete and abstract sets. It is interesting to note that although the two abstractions are equivalent w.r.t satisfaction of Lµ , they are not identical. For completeness, Dams et al. show that the most precise MixTS abstracting a Kripke structure satisfies the following conditions: RC (a, b) ⇔ b ∈ {y∈Y αe (y) | Y ∈ min{Y  | R∀∃ (γe (a), Y  )}} RF (a, b) ⇔ b ∈ {y∈Y αe (y) | Y ∈ min{Y  | R∃∃ (γe (a), Y  )}} where R∀∃ (S, T )  ∀s ∈ S · ∃t ∈ T · R(s)(t) and R∃∃ (S, T )  ∃s ∈ S · ∃t ∈ T · R(s)(t). It is different from our abstraction αT , which, when put in this notation, is: αT (R)(a)(b) = R∀∃ (γe (a), γe (b)), ¬R∃∃ (γe (a), γ̃e (b)) We believe that our characterization is simpler; however, it remains to be seen whether it is also more useful in practice, e.g., if it leads to a smaller symbolic representation, or easier to construct compositionally, etc. We leave this topic for future work. 7 Related Work Over the years, many abstraction methods have been developed for Lµ model-checking [5, 8, 12, 17, 21, 22, 24]. They concentrate on a specific model – transition systems and most of them preserve soundness (satisfaction) for fragments of Lµ : if an abstract system is an over-approximation of the concrete one, the abstraction is sound for all universal properties. Similarly, a sound abstraction for existential properties comes from under-approximation. The first approach for sound abstraction of full Lµ was proposed by Larsen and Thompsen [21]. They have shown that Modal Transition Systems (MTS) can be used to combine both over- and under-approximations. However, the goal of this work is not abstraction, and it did not consider the problem of how to abstract a Kripke structure Systematic Construction of Abstractions for Model-Checking 395 using an MTS. The construction problem is addressed by Dams et al. [8], who independently proposed using MixTSs, a slight generalization of MTSs, as abstract models, and provided conditions for constructing an MixTS with the best precision. Although this work uses AI to describe the relationship between concrete and abstract statespaces, abstract transition systems are not derived systematically; instead, the optimal conditions are defined based on intuition, and both soundness and optimality of precision require separate proofs. Among the attempts of using AI to systematically derive best abstractions, the work of Loiseaux et al. [22] and Schmidt [23] are the closest to ours. [22] showed how to derive a simulation-based sound abstract transition system from Galois connections within the AI framework, but their results apply only to the universal fragment of Lµ . Motivated by the study of MixTSs, [23] showed how to capture over- and underapproximations between transition systems using AI and systematically derived Dams’s most precise results. However, the starting goal of this work was formalizing the overand the under-approximations, restricting the result to the specific Lµ models, namely, transition systems. On the other hand, in our work we start from formalizing the notion of soundness of Lµ interpretations – the most general and exact goal of abstraction for Lµ (via the soundness relation ρi in Section 4), and then systematically derive conditions which guarantee the best precision of the abstraction. Thus, our results can be applied to different Lµ models, where abstracting transition systems is just a special case. Another important feature of our work is the use of bilattices. The approaches of [8, 23] develop best over- and under-approximations separately, whereas our combination of AI with bilattices provides a uniform way for abstraction of both satisfaction and refutation of Lµ . Multi-valued logic has been previously combined with abstraction in the form of 3-valued transition systems (e.g. [16]). However, these results do not use the framework of AI, and, in particular, only deal with soundness and not the precision of the abstraction. Furthermore, 3-valued Kripke structures (unlike those based on Belnap logic) lack monotonicity [24]: a more refined abstract domain does not necessarily result in a more precise abstraction, and thus the most precise abstraction may not even exist. 8 Conclusion In this paper, we have shown that abstract interpretation provides a systematic way for designing abstractions for model-checking. On one hand, our work can be seen as recreating the pioneering work of Dams et al. [8] in a systematic setting where each step in designing an abstraction and each loss of precision can be traced back to either the choice of an abstract domain, or the requirements on the abstract structure. On the other hand, our work also extends their results to non-traditional interpretations of Lµ , such as its multi-valued [4] and quantitative [10] interpretations. To the best of our knowledge, this is the first abstraction technique that can be applied to these nonclassical interpretations. In this paper, we lay the basic groundwork for designing Lµ -preserving abstractions using the framework of AI. However, our work can be easily extended in a number of directions. We discuss a few of them below. 396 A. Gurfinkel, O. Wei, and M. Chechik We have shown that requiring that an abstraction of a transition system be a transition system as well, comes with a loss of precision. Thus, it may be interesting to explore how a transition system can be abstracted directly by an abstract Lµ model. Such models will require new model-checking algorithms, but will provide additional precision, and possibly be easier to construct. For example, recent work on symmetry reduction [13] argues that instead of constructing a reduced abstract model, the symmetry-reduced ♦ modality can be implemented directly by putting symmetry reduction inside the modelchecking algorithm. We believe that our framework can be used to extend this approach to other, non-symmetry induced, abstract domains. Our work on a software modelchecker YASM [18] is a first step in this direction. In designing abstractions of Kripke structures, we have assumed that the domain and range of the transition function are abstracted by the same abstract domain. This need not be the case. By using different but related abstract domains, we obtain a generalization of “hyper-transition abstractions” [24, 11] to arbitrary abstract domains. Although not shown explicitly in the paper, the pointwise extension of the bilattice narrowing operator  to abstract structures provides a simple way to combine several, not necessarily best, abstractions. This allows us to study incremental construction of abstractions, such as the one in [1]. We believe that our framework provides an interesting starting point for exploring the connection between AI and model-checking, and hope to continue this line of research in the future. References 1. T. Ball, V. Levin, and F. Xie. “Automatic Creation of Environment Models via Training”. In TACAS’04, volume 2988 of LNCS, pages 93–107, 2004. 2. T. Ball and S. Rajamani. “The SLAM Toolkit”. In CAV’01, volume 2102 of LNCS, pages 260–264, 2001. 3. N.D. Belnap. “A Useful Four-Valued Logic”. In Dunn and Epstein, editors, Modern Uses of Multiple-Valued Logic, pages 30–56. Reidel, 1977. 4. M. Chechik, B. Devereux, S. Easterbrook, and A. Gurfinkel. “Multi-Valued Symbolic ModelChecking”. ACM TOSEM, 12(4):1–38, 2003. 5. Edmund M. Clarke, Orna Grumberg, and David E. Long. “Model Checking and Abstraction”. ACM TOPLAS, 16(5):1512–1542, 1994. 6. J. Corbett, M. Dwyer, J. Hatcliff, S. Laubach, C. Pasareanu, Robby, and H. Zheng. “Bandera: Extracting Finite-state Models from Java Source Code”. In ICSE’00, pages 439–448, 2000. 7. P. Cousot and R. Cousot. “Abstract Interpretation Frameworks”. Journal of Logic and Computation, 2(4):511–547, 1992. 8. D. Dams, R. Gerth, and O. Grumberg. “Abstract Interpretation of Reactive Systems”. ACM TOPLAS, 2(19):253–291, 1997. 9. B.A. Davey and H.A. Priestley. Introduction to Lattices and Order. Cambridge University Press, 1990. 10. L. de Alfaro, M. Faella, T. A. Henzinger, R. Majumdar, and M. Stoelinga. “Model Checking Discounted Temporal Properties”. In TACAS’04, volume 2988 of LNCS, pages 77–92, 2004. 11. L. de Alfaro, P. Godefroid, and R. Jagadeesan. “Three-Valued Abstractions of Games: Uncertainty, but with Precision”. In LICS’04, pages 170–179, 2004. 12. E. A. Emerson and A. P. Sistla. “Symmetry and Model Checking”. FMSD, 9(1-2):105–131, 1996. Systematic Construction of Abstractions for Model-Checking 397 13. E. A. Emerson and T. Wahl. “Dynamic Symmetry Reduction”. In TACAS’05, volume 3440 of LNCS, pages 382–396, 2005. 14. M. Fitting. “Bilattices are Nice Things”. In Conference on Self-Reference, 2002. 15. M. L. Ginsberg. “Multivalued Logics: A Uniform Approach to Reasoning in Artificial Intelligence”. Computational Intelligence, 4(3):265–316, 1988. 16. P. Godefroid, M. Huth, and R. Jagadeesan. “Abstraction-based Model Checking using Modal Transition Systems”. In CONCUR’01, volume 2154 of LNCS, pages 426–440, 2001. 17. S. Graf and H. Saı̈di. “Construction of Abstract State Graphs with PVS”. In CAV’97, volume 1254 of LNCS, pages 72–83, 1997. 18. A. Gurfinkel and M. Chechik. “Yasm: Model-Checking Software with Belnap Logic”. Technical Report 533, University of Toronto, April 2005. 19. A. Gurfinkel, O. Wei, and M. Chechik. “Logical Abstract Interpretation”. Technical Report 532, University of Toronto, September 2005. 20. D Kozen. “Results on the Propositional µ-calculus”. Theoretical Computer Science, 27:334– 354, 1983. 21. K.G. Larsen and B. Thomsen. “A Modal Process Logic”. In LICS’88, pages 203–210, 1988. 22. C. Loiseaux, S. Graf, J. Sifakis, A. Bouajjani, and S. Bensalem. “Property Preserving Abstractions for the Verification of Concurrent Systems”. FMSD, 6:1–35, 1995. 23. D. A. Schmidt. “Closed and Logical Relations for Over- and Under-Approximation of Powersets”. In SAS’04, volume 3148 of LNCS, pages 22–37, 2004. 24. S. Shoham and O. Grumberg. “Monotonic Abstraction-Refinement for CTL”. In TACAS’04, LNCS, pages 546–560, April 2004. 25. O. Wei, A. Gurfinkel, and M. Chechik. “Identification and Counter Abstraction for Full Virtual Symmetry”. In CHARME’05, volume 3725 of LNCS, 2005. Totally Clairvoyant Scheduling with Relative Timing Constraints K. Subramani LDCSEE, West Virginia University, Morgantown, WV {ksmani@csee.wvu.edu} Abstract. Traditional scheduling models assume that the execution time of a job in a periodic job-set is constant in every instance of its execution. This assumption does not hold in real-time systems wherein job execution time is known to vary. A second feature of traditional models is their lack of expressiveness, in that constraints more complex than precedence constraints (for instance, relative timing constraints) cannot be modeled. Thirdly, the schedulability of a real-time system depends upon the degree of clairvoyance afforded to the dispatcher. In this paper, we shall discuss Totally Clairvoyant Scheduling, as modeled within the E-T-C scheduling framework [Sub05]. We show that this instantiation of the scheduling framework captures the central issues in a real-time flow-shop scheduling problem and devise a polynomial time sequential algorithm for the same. The design of the polynomial time algorithm involves the development of a new technique, which we term Mutable Dynamic Programming. We expect that this technique will find applications in other areas of system design, such as Validation and Software Verification. 1 Introduction Real-time scheduling is concerned with the scheduling of computer jobs which are part of periodic job-sets. The execution times of these jobs are known to vary, as we move from one period to the next [AB98]. The most common cause for this feature is the presence of input-dependent loops in the program; the time taken to execute the loop structure for(i=1 to N), will in general be lesser when N=10, than when N=1000. A second reason for this variance is the statistical error associated with measuring execution times [LTCA89]. Consequently the traditional approach of assuming a fixed execution time for jobs [Pin95] may not be appropriate “hard” in real-time situations, where scheduling policies should hold regardless of the actual time taken to execute by each job. Traditional models suffer from a second drawback, viz., the inability to specify complex constraints such as relative timing constraints. The literature on deterministic scheduling focuses almost exclusively on ready-time, deadline and precedence constraints [GLLK79]. In real-time applications though, there is often the necessity to constrain jobs through relationships of the form: Start Job  The research of this author was supported in part by the Air Force of Scientific Research. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 398–411, 2006. c Springer-Verlag Berlin Heidelberg 2006  Totally Clairvoyant Scheduling with Relative Timing Constraints 399 J5 at least 5 units after Job J2 terminates, Start Job J5 within 12 units of Job J2 starting. Such relationships cannot be captured through precedence graphs, which are by definition, Directed Acyclic Graphs, whereas systems of relative timing constraints in real-time scheduling clearly contain cycles. An important feature of any scheduling model is the schedulability predicate, i.e., what it means for a job-set to be schedulable [Sub05]. In fact, the complexity of the scheduling problem under consideration is determined in large part by the type of guarantee that we wish to provide. In this paper, our focus is on providing a polynomial time algorithm for Totally Clairvoyant Scheduling in the presence of relative timing constraints. The principal contributions of this paper are as follows: (a) Modeling a flow-shop problem as an instance of Totally Clairvoyant scheduling with relative timing constraint, (b) Developing a polynomial time algorithm for this problem, and (c) Introducing a new algorithmic technique called Mutable Dynamic Programming (See Sections §4 and §6). It is to be noted that at its heart, the Totally Clairvoyant Scheduling problem is concerned with the verification of a quantified expressions. Such quantified expressions are often found in the modeling of continuous real-time and embedded systems and hence our algorithm can be thought of as an efficient verification mechanism for these kinds of problems. 2 Statement of Problem In this section, we detail a formal description of the problem under consideration. 2.1 Job Model Assume an infinite time-axis divided into windows of length L, starting at time t = 0. These windows are called periods or scheduling windows. There is a set of non-preemptive, ordered jobs, J = {J1 , J2 , . . . , Jn } that executes in each scheduling window. The occurrences of the same job in different windows are referred to as instances of that job. The jobs must execute in the sequence J1 , J2 , . . . , Jn . 2.2 Constraint Model The constraints on the jobs are described by System (1): A · [s e]T ≤ b, e ∈ E, (1) where, (a) A is an m × 2 · n rational matrix,  b is a rational m− vector, (A, b) is called the initial constraint matrix. 400 K. Subramani (b) E is an axis-parallel hyper-rectangle (aph) which is represented as the product of n closed intervals [li , ui ], i.e., E = [l1 , u1 ] × [l2 , u2 ] × . . . × [ln , un ] (2) We are modeling the fact that the execution time of a task can take any value in the range [li , ui ] during actual execution and is not a fixed constant.  havObserve that E can be represented as a polyhedral system M · e ≤ m, ing 2 · n constraints and n variables. (c) s = [s1 , s2 , . . . , sn ]T is the start time vector of the jobs, and (d) e = [e1 , e2 , . . . , en ]T ∈ E is the execution time vector of the jobs. We reiterate that e could be different in different windows, i.e., different task instances of the same job could have different execution times (within the specified interval [li , ui ] for Ji ) in different scheduling windows. However, in any particular period, the execution time of the job is fixed and known at the start of the period. The jobs are non-preemptive; hence the finish time of job Ji (with start time si ) is si + ei . The expressive power of the scheduling framework is therefore not enhanced by introducing separate finish time variables to model constraints. The ordering on the jobs is achieved by the constraint set: si + ei ≤ si+1 ∀i = 1, 2, . . . , n−1; these constraints are part of the A matrix. In the absence of ordering, the constraints on the job system cannot be captured through a polynomialsized linear system, unless P=NP, since integer variables will be required to enforce non-preemption [Hoc96, Pin95]. We only permit relative timing constraints between jobs. These constraints are of the form: si + ei ≤ sj + ej + a, si + ei ≤ sj + a, si ≤ sj + ej + a, si ≤ sj + a, where a is an arbitrary integer and express relative timing (distance) constraints between the jobs Ji and Jj . As indicated, the constraints can exist between start or finish times of the jobs. Note that these constraints are a superset of absolute constraints, i.e., constraints of the form: si ≤ a, si ≥ a or si + ei ≤ a, si + ei ≥ a, where a is some positive integer. The above constraints have also been called “standard” constraints [GPS95] in the literature. We shall be using the terms “standard constraint” and relative constraint interchangeably, for the rest of the discussion. 2.3 Query Model In the real-time applications that we consider such as Flow-Shop (see Section §3), it is possible to calculate with sufficient accuracy the execution times of the jobs in the current period and a few periods into the future. Totally Clairvoyant Scheduling assumes knowledge of the execution time of every job in the job-set, at the start of each scheduling window; the execution time vector may be different in different windows. We wish to enforce the condition that the constraints described by System (1) are met in each scheduling window, regardless of the actual execution times of the jobs. Further, the start-time vector can depend upon the execution time vector of that window. Totally Clairvoyant Scheduling with Relative Timing Constraints 401 We are now in a position to formally state the Totally Clairvoyant schedulability query: Q : ∀e = [e1 , e2 , . . . , en ] ∈ E ∃s = [s1 , s2 , . . . , sn ] A · [s e]T ≤ b ? (3) The focus of this paper is on the design of a polynomial time procedure to decide Query (3) (henceforth Q). 3 Motivation In this section, we show that a practical real-time scheduling problem can be captured as an instance of Totally Clairvoyant scheduling, with relative timing constraints. M1 M2 M3 Mn Objects of different sizes Buffer A Flow direction Buffer B Buffer C [Solid arrows] Relative Timing Constraints [dashed arrows] Fig. 1. Bounded-buffer Flow Shop Figure (1) represents a bounded buffer flow shop. The flow shop consists of n machines M1 through Mn and one or more feed-buffers (or feeders). In our example these buffers are A, B and C. Objects to be tooled also called jobs are placed in these buffers. The timeline on which the flow shop operates is divided into equal length portions called periods. At the start of each period, the job in the each buffer moves to the buffer ahead of it, while the job in the first buffer (Buffer A) enters machine M1 . Within the period, the job moves sequentially from machine Mi to machine Mi+1 , respecting the relative timing constraints (represented by the curved, broken arrows) and finally exits at machine Mn before the end of the period. Relative timing constraints are used to capture relationships such as heating and cooling requirements; for instance the requirement that the object should wait 5 units of time after exiting machine M1 , before it enters machine M2 is represented as: s2 ≥ s1 + e1 + 5, where s2 is the time at 402 K. Subramani which the object enters M2 and s1 + e1 is the time at which it exits M1 . This process is repeated in every period with new objects continuously entering the flow at the last buffer. (This example is taken from [Pin95].) Let si denote the time at which the machine Mi begins operating on the current job and let ei denote the time it takes to complete its operation on the job. Observe that the operation time of machine Mi on a job, i.e., ei is a nondecreasing function of the job size. As shown Figure (1), the buffer pipeline is populated by jobs of different sizes and hence ei is different for different jobs. Design Problem: Given 1. Timing constraints between the flow shop machines, 2. Lower and Upper bounds on the operation time of by each machine, Does there exist a valid schedule i.e., a schedule that respects the timing constraints, for any job with size sz, where szli ≤ sz ≤ szui , i = 1, 2, . . . , n? The flow-shop example in this section is easily modeled as a Totally Clairvoyant scheduling problem, with the machines acting as the jobs with variable execution times. 4 Related Work In [Sub05], we introduced the E-T-C scheduling framework as a model to identify and represent issues in real-time scheduling. Within the framework of that model, Zero-Clairvoyant scheduling has been addressed in [Sub02] and Partially Clairvoyant Scheduling has been detailed in [GPS95, CA00]. This is the first paper on Totally Clairvoyant scheduling. We point out that the variable elimination techniques used for Partially Clairvoyant scheduling do not seem to work in case of Totally Clairvoyant scheduling for the following reason: Standard constraints are preserved under job elimination in Partially Clairvoyant scheduling, whereas they are not preserved under job elimination in Totally Clairvoyant Scheduling. This has led us to develop a novel approach for the Totally Clairvoyant scheduling problem, which we term Mutable Dynamic Programming. Orthogonal approaches to the issues of clairvoyance and speed have been discussed extensively in [KP00]. A number of online scheduling models with and without clairvoyance are discussed in [FW98]; however their primary concern is optimizing performance metrics in the presence of multiple processors, whereas we are concerned with checking feasibility on a single processor. 5 The Complement Problem A simple technique to test the schedulability of a Totally Clairvoyant system is as follows: Let e1 , e2 , . . . el be the extreme points of E. Substitute each extreme point of E in the constraint system A · [s e]T ≤ b and declare Q to be true if and only if each of the resulting linear systems (in the start-time variables) is feasible; since any execution time vector e ∈ E can be expressed as a convex Totally Clairvoyant Scheduling with Relative Timing Constraints 403 combination of the extreme points of E. Unfortunately such a strategy takes Ω(2n ) time, since E has 2n extreme points. In this section, we shall study the complement of Query (3); our insights into the complement problem will be used to develop a polynomial time algorithm in Section §6. Let us rewrite Query (3) as: ∀e ∈ E ∃s G · s + H · e ≤ b, s ≥ 0? (4) The complement of Query (4) is: ∃e ∈ E ∀s G · s + H · e ≤ b, s ≥ 0? (5) where the notation A · x ≤ b means that at least one of the constraints is violated. By applying Farkas’ Lemma [Sch87], we know that Query (5) is true if and only if the query: ∃y ∃e ∈ E y · G ≥ 0, y · (b − H · e) < 0, y ≥ 0? (6) is true. Construct the weighted directed graph G =< V, F, c > as follows: 1. Corresponding to each start time variable si add the vertex vi to V 2. Corresponding to each constraint of the form lp : si (+ei ) ≤ sj (+ej ) + k, add a directed edge of the form vi ; vj having cost clp = (ej ) − (ei ) + k. G is called the constraint graph corresponding to the constraint system A · [s e]T ≤ b. Remark 5.1. In the above construction, it is possible that there exist multiple edges between the same pair of vertices; hence, technically, G is a constraint multi-graph. Definition 1. Let p = vi ; vj ; . . . vq denote a simple path in G; the co-static cost of p is calculated as follows: 1. Symbolically add up the costs on all the edges that make up the path p to get an affine function f (p) = r · e − k, (r = [r1 r2 . . . rn ]T , e = [e1 e2 . . . en ]T ) for suitably chosen r and k. Note that each ri belongs to the set {0, 1, −1}, since on any simple path, which is not a cycle, each vertex is encountered exactly once. Consequently, an execution time variable can be encountered at most twice, and if it is encountered twice, then the two occurrences will have opposite sign and cancel each other out. 2. Compute a numerical value for f (p) by substituting ei = li , if ri ≥ 0 and ei = ui otherwise. This computed value is called the co-static cost of p. In other words, the co-static cost of path p is minE f (p) = minE (r · e − k). The co-static cost of a simple cycle is calculated similarly; if the cost of a cycle C in G is negative, then C is called a negative co-static cycle. The only difference between a simple path and a simple cycle is that one vertex occurs twice in this cycle. But even in this case, it is easily seen that ri ∈ {0, 1, −1}. 404 K. Subramani Theorem 1. A Totally Clairvoyant Scheduling Constraint System over a system of relative constraints has a solution if and only if its constraint graph does not have a simple negative co-static cycle. Proof: Assume that the constraint graph has a co-static negative cycle C1 defined by {v1 ; v2 ; . . . ; vk ; v1 }; the corresponding set of constraints in the constraint set are: s1 − s2 ≤ f1 (e1 , e2 ) s2 − s3 ≤ f2 (e2 , e3 ) .. .. .. . . . sk − s1 ≤ fk (ek , e1 ) Now, assume that there exists a solution s to the constraint k system. Adding up the inequalities in the above system, we get ∀e ∈ E 0 ≤ i=1 fi (ei , ei+1 ), where the indexes are modulo  k. But we know that C1 is a negative co-static cycle; it follows that mine∈E ki=1 fi (ei , ei+1 ) < 0; thus, we cannot have k ∀e ∈ E i=1 fi (ei , ei+1 ) ≥ 0, contradicting the hypothesis. Now consider the case, where there does not exist a negative co-static cycle. Let Ge =< V, F, ce > denote the constraint graph that results from substituting e ∈ E into the constraint system defined by System (1). It follows that for all e ∈ E, Ge does not have any negative cost cycles. Hence for each e ∈ E, the corresponding constraint system in the start-time variables has a solution (the vector of shortest path distances, see [CLR92]). In other words, the schedulability query Q is true. 2 Our efforts in the next section, will be directed towards detecting the existence of negative co-static cycles in the constraint graph corresponding to a Totally Clairvoyant Scheduling Constraint system; this problem is henceforth called P1 . 6 Mutable Dynamic Programming In this section, we propose an algorithm for P1 , based on Mutable Dynamic Programming. The key idea is to find the path of least co-static cost (shortest path) from each vertex vi ∈ V to itself. By Theorem (1), we know that the constraint system is infeasible if and only if at least one of these paths has negative co-static cost. We motivate the development of our algorithm by classifying the edges that exist between vertices in the initial constraint graph. An edge vi ; vj representing a constraint between jobs Ji and Jj must be one of the following types: 1. Type I edge: The weight of the edge does not depend upon either ei or ej , i.e., the corresponding constraint is expressed using only the start times of Ji and Jj . For instance, the edge corresponding to the constraint si + 4 ≤ sj is a Type I edge. Totally Clairvoyant Scheduling with Relative Timing Constraints 405 2. Type II edge: The weight of the edge depends upon both ei and ej , i.e., the corresponding constraint is expressed using only the finish times of Ji and Jj . For instance, the edge corresponding to the constraint si +ei +8 ≤ sj +ej is a Type II edge. 3. Type III edge: The weight of the edge depends upon ei , but not on ej , i.e., the corresponding constraint is expressed using the finish time of Ji and the start time of Jj . For instance, the edge corresponding to the constraint si + ei + 25 ≤ sj is a Type III edge. 4. Type IV edge: The weight of the edge depends upon ej , but not on ei , i.e., the corresponding constraint is expressed using the start time of Ji and the finish time of Jj . For instance, the edge corresponding to the constraint si + 13 ≤ sj + ej is a Type IV edge. Lemma 1. There exists at most one non-redundant vi ; vj edge of Type II. Proof: Without loss of generality, we assume that i < j, i.e., job Ji occurs in the sequence before job Jj . For the sake of contradiction, let us suppose that there exist 2 non-redundant Type II vi ; vj edges; we denote the corresponding 2 constraints as l1 : si + ei + k1 ≤ sj + ej and l2 : si + ei + k2 ≤ sj + ej ; note that they can be written as: l1 : si − sj ≤ ej − ei − k1 and l2 : si − sj ≤ ej − ei − k2 . Let us say that k1 ≥ k2 , so that −k1 ≤ −k2 . We now show that l2 can be eliminated from the constraint set without affecting its feasibility. Note that for any fixed values of ei and ej , l1 dominates l2 in the following sense: If l1 is satisfied, then l2 is also satisfied. Likewise, if there is a cycle of negative co-static cost through the edge representing l2 , then there is a cycle of even lower co-static cost through the edge representing l1 . Hence l2 can be eliminated from the constraint set, without affecting its feasibility. The case in which i > j can be argued in similar fashion. 2 Corollary 1. There exists at most one non-redundant vi ; vj edge each of Types I, III and IV. Proof: Identical to the proof of Lemma (1). 2 Corollary 2. The number of non-redundant constraints in the initial constraint matrix which is equal to the number of non-redundant edges in the initial constraint graph is at most O(n2 ). Proof: It follows from Corollary (1) that there can exist at most 4 i ; j constraints and hence at most 4 vi ; vj edges between every pair of vertices vi , vj , i, j = 1, 2, . . . , n, i = j. Hence the total number of edges in the initial constraint ) = O(n2 ). 2 graph cannot exceed O(8 · n·(n−1) 2 We extend the taxonomy of edges discussed above to classifying paths in a straightforward way; thus a Type I path from vertex vi to vertex vj is a path whose cost does not depend on either ei or ej . Paths of Types II, III and IV are defined similarly. 406 K. Subramani Table 1. Computing the type of a path from the types of its sub-paths vi ; vk Type I Type I Type I Type I vk ; vj Type I Type II Type III Type IV vi ; vj Type I Type IV Type I Type IV Type Type Type Type Type Type II II II II II II Type I Type II Type II Type III Type IV Type IV Type III Type I (if j = i) Type II (if j = i) Type III Type I (if j = i) Type II (if j = i) Type Type Type Type Type Type III III III III III III Type I Type II Type II Type III Type IV Type IV Type III Type I (if j = i) Type II (if j = i) Type III Type I (if j = i) Type II (if j = i) Type Type Type Type IV IV IV IV Type I Type II Type III Type IV Type I Type IV Type I Type IV Table 1 shows how to compute the type of a path, given the types of the sub-paths that constitute it. We restrict our attention to paths and cycles of Type I; we shall see that our arguments carry over to paths and cycles of other types. As discussed above, there are at most 4 edges vi ; vj , for any vertex pair (vi , vj ). We define wij (I) = symbolic cost of the T ype I edge between vi and vj , if such an edge exists = ∞, otherwise. wij (II), wij (III) and wij (IV ) are similarly defined. Note that Lemma (1) and Corollary (1) ensure that wij (R) is well-defined for R = I, II, III, IV . By convention, wii (R) = 0, i = 1, 2, . . . , n; R = I, III, IV . Note that a path of Type II from a vertex vi to itself, is actually a Type I path! Initialize the n × n × 4 matrix W as follows: W[i][j][R] = wij (R), i = 1, 2, . . . , n; j = 1, 2, . . . , n R = I, II, III, IV . Note that the entries of W are not necessarily numbers; for instance, if there exists a constraint of the form si + ei + 7 ≤ sj + ej , then wij (II) = −ei + ej − 7. Let pkij (I) denote the path of Type I from vertex vi to vertex vj having the smallest co-static cost, with all intermediate vertices in the set {v1 , v2 , . . . , vk }, for some k > 0; note that p0ij = wij (I). We refer to pkij as the shortest Type I Totally Clairvoyant Scheduling with Relative Timing Constraints 407 Path, from vi to vj , with all intermediate vertices in the set {v1 , v2 , . . . , vk }. Further, let ckij (I) denote the co-static cost and dkij (I) denote the corresponding symbolic cost; observe that ckij (I) = minE dkij (I) and that given dkij (I), ckij (I) can be computed in O(n) time, through substitution. The quantities, pkij (R), dkij (R) and ckij (R), R = II, III, IV are defined similarly. Let us study the structure of pkij (I). We consider the following two cases. (a) Vertex vk is not on pkij (I) - In this case, the shortest Type I path from vi to vj with all the intermediate vertices in {v1 , v2 , . . . , vk } is also the shortest Type I path from vi to vj with all the intermediate vertices in {v1 , v2 , . . . , vk−1 }, k−1 k i.e., pkij (I) = pk−1 ij (I) and dij (I) = dij (I). (b) Vertex vk is on pkij (I) - Let us assume that j = i, i.e., the path pkij is not a cycle. From Table 1, we know that one of the following must hold (See Figure (2)): vk vi vj Fig. 2. Shortest path of Type I from vi to vj through vk (a) vi ; vk is of Type I and vk ; vj is of Type I - Let p1 denote the sub-path of pkij from vi to vk and let p2 denote the sub-path of pkij from vk to vj . We claim that p1 must be the shortest Type I path from vi to vk with all the intermediate vertices in the set {v1 , v2 , . . . , vk−1 }, i.e., pk−1 ik (I). To see this, let us assume that p1 is not optimal and that there exists another Type I path of smaller co-static cost. Clearly this path can be combined with the existing Type I path from vk to vj to get a shorter Type I path from vi to vj , contradicting the optimality of pkij (I). The same argument holds for the optimality of the sub-path of p2 . This property is called the Optimal Substructure property. We thus  k−1 k−1 k−1 k k (I) = p (I) p + dk−1 have, p ij ik kj (I) and dij (I) = dik kj , where the  operator indicates that the combination of the 2 paths. (b) vi ; vk is of Type I and vk ; vj is of Type III - We argue in a fashion  k−1 (I) p similar to the above case to derive: pkij (I) = pk−1 ik kj (III) and k−1 k−1 k dij (I) = dik (I) + dkj (III). and vk ; vj is of Type I - It follows that (c) vi ; vk is of Type  IV k−1 k−1 k−1 k pkij (I) = pk−1 (IV ) p ik kj (I) and dij (I) = dik (IV ) + dkj (I). (d) vi ; vk is of Type III - It follows that  IV and vk ; kvj is of Type k−1 (IV ) p (III) and d (I) = d (IV ) + dk−1 pkij (I) = pk−1 kj ij ik ik kj (III). 408 K. Subramani Clearly, if vk is on pkij (I), then k−1 k−1 k−1 dkij (I) = min{dk−1 ik (I) + dkj (I), dik (I) + dkj (III), E k−1 dik (IV k−1 k−1 ) + dk−1 kj (I), dik (IV ) + dkj (III)} (7) Remark 6 .1. dkij (I) represents the symbolic cost of the shortest Type I path from vi to vj , with all intermediate vertices in the set {v1 , v2 , . . . , vk }. Thus, the minE operator is used merely to select the appropriate path pairs. In particular, in the calculation of dkij (I), it does not reduce dkij (I) to a numeric value, although ckij (I) is a numeric value. It is this form of Dynamic Programming that we refer to as Mutable Dynamic Programming. Putting the 2 cases together, we have k−1 k−1 k−1 k−1 dkij (I) = min{dk−1 ij (I), dik (I) + dkj (I), dik (I) + dkj (III), E k−1 k−1 k−1 dk−1 ik (IV ) + dkj (I), dik (IV ) + dkj (III)} (8) Now consider the case that the path pkij (I) is a cycle, i.e., j = i. From Table 1, we know that one of the following must hold: 1. vi ; vk is of Type I and vk ; vi is of Type I - This case has been handled above. 2. vi ; vk is of Type II and vk ; vi is of Type II, i.e., dkii (I) = dk−1 ik (II) + (II). dk−1 ki 3. vi ; vk is of Type II and vk ; vi is of Type IV, i.e., dkii (I) = dk−1 ik (II) + (IV ). dk−1 ki 4. vi ; vk is of Type III and vk ; vi is of Type II, i.e., dkii (I) = dk−1 ik (III) + k−1 dki (II). 5. vi ; vk is of Type III and vk ; vi is of Type IV, i.e., dkii (I) = dk−1 ik (III) + (IV ). dk−1 ki Note that the case k = 0, corresponds to the existence (or lack thereof) of a Type I edge from vi to vj . Thus, the final recurrence relation to calculate the cost of pkij (R), R = I, II, III, IV is: dkij (I) = wij (I), if k = 0 k−1 k−1 k−1 k−1 k−1 = min{dk−1 ik (I) + dki (I), dik (II) + dki (II), dik (II) + dki (IV ), = E k−1 k−1 k−1 dik (III) + dk−1 ki (II), dik (III) + dki (IV ))}, if j = i k−1 k−1 k−1 k−1 min{dk−1 ij (I), dik (I) + dkj (I), dik (I) + dkj (III), E k−1 k−1 k−1 dk−1 ik (IV ) + dkj (I), dik (IV ) + dkj (III)}, otherwise (9) Totally Clairvoyant Scheduling with Relative Timing Constraints 409 Using similar analyses, we derive recurrence relations for dkij (R), R = II, III, IV as follows: dkij (II) = wij (II), if k = 0 k−1 k−1 k−1 k−1 = min{dk−1 ij (II), dik (II) + dkj (II), dik (II) + dkj (IV ), E k−1 dik (III) k−1 k−1 + dk−1 kj (II), dik (III) + dkj (IV )}, otherwise (10) dkij (III) = wij (III), if k = 0 k−1 k−1 k−1 k−1 = min{dk−1 ij (III), dik (II) + dkj (I), dik (II) + dkj (III) E k−1 dik (III) k−1 k−1 + dk−1 kj (I), dik (III) + dkj (III)}, otherwise (11) dkij (IV ) = wij (IV ), if k = 0 k−1 k−1 k−1 k−1 = min{dk−1 ij (IV ), dik (I) + dkj (II), dik (I) + dkj (IV ), E k−1 k−1 k−1 dk−1 ik (IV ) + dkj (II), dik (IV ) + dkj (IV )}, otherwise (12) Note that for a specific k, the values of the execution time variables, corresponding to the inner vertices of the path from vi to vk are fixed, by the application of the minE operator. Algorithm (6.1) summarizes the above discussion on the identification of a negative co-static cycle in the constraint graph G. We note that Dnij (I) represents the shortest Type I vi ; vj path with all the intermediate vertices in the set {v1 , v2 , . . . , vn }, i.e., it is the shortest Type I vi ; vj path. Eval-Loop() evaluates the co-static cost of each of the diagonal entries and declares the G to be co-static negative cycle free, if all entries have non-negative cost. Further, we need not consider the case j = i separately, in the computations of dkij (R), R = II, III, IV . Remark 6. 2.We reiterate that the dkij values are symbolic, while the ckij values are numeric. The minE operator is applied only to select the appropriate subpath; the dkij cost is computed in the symbolic sense only. Once the correct subpaths have been selected, as per the principle of optimality, we can move on to the next stage. On account of the structure in the edge costs, the selection and addition procedures can be implemented in O(n) time. 6.1 Complexity The complexity of Algorithm (6.1) is determined by Step (7 :) within the O(n3 ) triple loop. It is easy to see that if the symbolic costs are stored in arrays, Step (7 :) can be implemented in O(n) time; it follows that Steps (1 : −10 :) can be implemented in time at most O(n4 ). Step (13 :) takes time at most O(n) and hence Steps (11 : −18 :) take time at most O(n2 ). 410 K. Subramani Function Detect-CoStatic-Negative-Cycle(G) 1: Initialize W. 2: Set D0 = W. 3: for (k = 1 to n) do 4: {We are determining Dk } 5: for (i = 1 to n) do 6: for (j = 1 to n) do 7: Compute Dkij (I), Dkij (II), Dkij (III), Dkij (IV ) using the relations (9), (10), (11), (12). 8: end for 9: end for 10: end for 11: for (i = 1 to n) do 12: for ( R = I to IV ) do 13: if (Eval-Loop(Dn ii (R)) < 0) then 14: return( true ) 15: end if 16: end for 17: end for 18: return( false ) Algorithm 6.1. Algorithm for identifying negative co-static cycles in the constraint graph Theorem 2. The schedulability query for an instance of a Totally Clairvoyant scheduling problem with n jobs and m strict relative (standard) constraints can be decided in O(n4 ) time. 7 Conclusions In this paper, we discussed uncertainty issues in a real-time flow shop scheduling problem and designed a polynomial time algorithm for the same. The algorithm was based on a novel form of Dynamic Programming, called Mutable Dynamic Programming, which we believe may be useful in other application domains involving uncertainty and symbolic computation. Some of the interesting open theoretical questions are as follows: (a) What is the complexity of Totally Clairvoyant scheduling in the presence of more general constraints such as Network Constraints [Sub01]?, (b) What is the complexity of finding a schedule minimizing metrics such as Sum of Start Times and Sum of Completion Times? (c) Can we improve on the O(n4 ) bound derived in this paper to test Totally Clairvoyant schedulability. We once again reiterate the importance of this technique to problems in Symbolic Model checking and Verification. Problems in these domains can be modeled as constraint verification problems and Mutable Dynamic Programming is a new procedure for these problems. Totally Clairvoyant Scheduling with Relative Timing Constraints 411 References [AB98] Alia Atlas and A. Bestavros. Design and implementation of statistical rate monotonic scheduling in kurt linux. In Proceedings IEEE Real-Time Systems Symposium, December 1998. [CA00] Seonho Choi and Ashok K. Agrawala. Dynamic dispatching of cyclic realtime tasks with relative timing constraints. Real-Time Systems, 19(1):5–40, 2000. [CLR92] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press and McGraw-Hill Book Company, Boston, Massachusetts, 2nd edition, 1992. [FW98] Amos Fiat and Gerhard Woeginger. Online algorithms: the state of the art, volume 1442 of Lecture Notes in Computer Science. Springer-Verlag Inc., New York, NY, USA, 1998. [GLLK79] R. L. Graham, E. L. Lawler, J. K. Lenstra, and A. H. G. Rinnooy Kan. Optimization and approximation in deterministic sequencing and scheduling: A survey. Ann. Discrete Mathematics, 5:287–326, 1979. [GPS95] Richard Gerber, William Pugh, and Manas Saksena. Parametric dispatching of hard real-time tasks. IEEE Trans. Computers, 44(3):471–479, 1995. [Hoc96] Dorit Hochbaum, editor. Approximation Algorithms for NP-Hard Problems. PWS Publishing Company, Boston, Masachusetts, 1996. [KP00] Kalyanasundaram and Pruhs. Fault-tolerant real-time scheduling. ALGRTHMICA: Algorithmica, 28, 2000. [LTCA89] S. T. Levi, S. K. Tripathi, S. D. Carson, and A. K. Agrawala. The Maruti Hard Real-Time Operating System. ACM Special Interest Group on Operating Systems, 23(3):90–106, July 1989. [Pin95] M. Pinedo. Scheduling: theory, algorithms, and systems. Prentice-Hall, Englewood Cliffs, 1995. [Sch87] Alexander Schrijver. Theory of Linear and Integer Programming. John Wiley and Sons, New York, 1987. [Sub01] K. Subramani. Parametric scheduling for network constraints. In Jie Wang, editor, Proceedings of the 8th Annual International Computing and Combinatorics Conference (COCOON), volume 2108 of Lecture Notes in Computer Science, pages 550–560. Springer-Verlag, August 2001. [Sub02] K. Subramani. An analysis of zero-clairvoyant scheduling. In JoostPieter Katoen and Perdita Stevens, editors, Proceedings of the 8th International Conference on Tools and Algorithms for the construction of Systems (TACAS), volume 2280 of Lecture Notes in Computer Science, pages 98– 112. Springer-Verlag, April 2002. [Sub05] K. Subramani. A comprehensive framework for specifying clairvoyance, constraints and periodicty in real-time scheduling. The Computer Journal, 48(3):259–272, 2005. Verification of Well-Formed Communicating Recursive State Machines Laura Bozzelli1 , Salvatore La Torre2, and Adriano Peron1 1 2 Università di Napoli Federico II , Via Cintia, 80126 - Napoli, Italy Università degli Studi di Salerno, Via S. Allende, 84081 - Baronissi, Italy Abstract. In this paper we introduce a new (non-Turing powerful) formal model of recursive concurrent programs called well-formed communicating recursive state machines (CRSM). CRSM extend recursive state machines (RSM) by allowing a restricted form of concurrency: a state of a module can be refined into a finite collection of modules (working in parallel) in a potentially recursive manner. Communication is only possible between the activations of modules invoked on the same fork. We study the model checking problem of CRSM with respect to specifications expressed in a temporal logic that extends CaRet with a parallel operator (ConCaRet). We propose a decision algorithm that runs in time exponential in both the size of the formula and the maximum number of modules that can be invoked simultaneously. This matches the known lower bound for deciding CaRet model checking of RSM, and therefore, we prove that model checking CRSM with respect to ConCaRet specifications is Exptime-complete. 1 Introduction Computer programs often involve the concurrent execution of multiple threads interacting with each other. Each thread can require recursive procedure calls and thus make use of a local stack. In general, combining recursion and task synchronization leads to Turing-equivalent models. Therefore, there have been essentially two approaches in the literature that address the problem of analyzing recursive concurrent programs: (i) abstraction (approximate) techniques on ‘unrestricted models’ (e.g. see [5, 14]), and (ii) precise techniques for ‘weaker models’ (with decidable reachability), obtained by imposing restrictions on the amount of parallelism and synchronization. In the second approach, many non Turing-equivalent formalisms, suitable to model the control flow of recursive concurrent programs, have been proposed. One of the most powerful is constituted by Process Rewrite Systems (PRS, for short) [13], a formalism based on term rewriting, which subsumes many common infinite–state models such as Pushdown Processes, Petri Nets, and PA processes. PRS can be adopted as a formal model for programs with dynamic creation and (a restricted form of) synchronization of concurrent processes, and with recursive  This research was partially supported by the MIUR grant ex-60% 2003-2004 Università degli Studi di Salerno. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 412–426, 2006. c Springer-Verlag Berlin Heidelberg 2006  Verification of Well-Formed Communicating Recursive State Machines 413 procedures (possibly with return values). PRS can accommodate both parallelcall commands and spawn commands for the dynamic creation of threads: in a parallel-call command, the caller thread suspends its activation waiting for the termination of all called processes, while in a spawn command, the caller thread can pursue its execution concurrently to the new activated threads. However, there is a price to pay to this expressive power: for the general framework of PRS, the only decidability results known in the literature concern the reachability problem between two given terms and the reachable property problem [13]. Model checking against standard propositional temporal logics is undecidable also for small fragments. Moreover, note that the best known upper bound for reachability of Petri nets (which represent a subclass of PRS) requires non primitive recursive space [9]. In [6], symbolic reachability analysis is investigated and the given algorithm can be applied only to a strict subclass of PRS, i.e, the synchronization–free PRS (the so–called PAD systems) which subsume Pushdown Processes, synchronization–free Petri nets, and PA processes. In [10, 11], symbolic reachability analysis of PA processes is used to allow the interprocedural data-flow analysis of programs represented by systems of parallel flow graphs (parallel FGS, for short), which extend classical sequential flow graphs by parallel-call commands. In [7], Dynamic Pushdown Networks (DPN) are proposed for flow analysis of multithreaded programs. DPN allows spawn commands and can encode parallel FGS. An extention of this model that captures the modelling power of PAD is also considered. In this paper, we consider a different abstract model of concurrent programs with finite domain variables and recursive procedures, the well-formed communicating recursive state machines (CRSM). A CRSM is an ordered collection of finite–state machines (called modules) where a state can represent a call, in a potentially recursive manner, to a finite collection of modules running in parallel. A parallel call to other modules models the activation of multiple threads in a concurrent program (fork ). When a fork happens, the execution of the current module is stopped and the control moves into the modules activated in the fork. On termination of such modules, the control return to the calling module and its execution is resumed (join). CRSM allow the communication only between module instances that are activated on the same fork and do not have ongoing procedure calls. In our model, we allow multiple entries and exits for each module, which can be used to handle finite–domain local variables and return values from procedure calls (see [1]). Intuitively, CRSM correspond to the subclass of PRS obtained by disallowing rewrite rules which model spawn commands. Also, note that CRSM extend parallel FGS since they also allow (a restricted form of) synchronization and return values from (parallel) procedure calls. With respect to DPN, CRSM allow synchronization. Moreover, CRSM extend both the recursive state machines (RSM) [1] by allowing parallelism and the well-structured communicating (finite-state) hierarchical state machines [3] by allowing recursion. We recall that RSM correspond to pushdown systems, and the related model checking problem has been extensively studied in the literature [17, 4, 1]. CRSM are strictly more expressive 414 L. Bozzelli, S. La Torre, and A. Peron than RSM. In fact, it is possible to prove that synchronization-free CRSM correspond to a complete normal form of Ground Tree Rewriting systems [8] and thus the related class of languages is located in the Chomsky hierarchy strictly between the context-free and the context-sensitive languages [12]. In this paper, we address the model-checking problem of CRSM with respect to specifications expressed in a new logic that we call ConCaRet. ConCaRet is a linear-time temporal logic that extends CaRet [2] with a parallel operator. Recall that CaRet extends standard LTL allowing the specification of non-regular context-free properties that are useful to express correctness of procedures with respect to pre- and post-conditions. The model checking of RSM with respect to CaRet specification is known to be Exptime-complete [2]. The semantics of ConCaRet is given with respect to (infinite) computations of CRSM. In our model, threads can fork, thus a global state (i.e., the stack content and the current local state of each active thread) of a CRSM is represented as a ranked tree. Hence, a computation corresponds to a sequence of these ranked trees. As in CaRet, we consider three different notions of local successor, for any local state along a computation, and the corresponding counterparts of the usual temporal operators of LTL logic: the abstract successor captures the local computation within a module removing computation fragments corresponding to nested calls within the module; the caller denotes the local state (if any) corresponding to the “innermost call” that has activated the current local state; a (local) linear successor of a local state is the usual notion of successor within the corresponding thread. In case the current local state corresponds to a fork, its successors give the starting local states of the activated threads. With respect to the paths generated by the linear successors, we allow the standard LTL modalities coupled with existential and universal quantification. Note that ConCaRet has no temporal modalities for the global successor (i.e., the next global state in the run). In fact, this operator would allow us to model unrestricted synchronization among threads and thus, the model checking of ConCaRet formulas would become undecidable. In ConCaRet we also allow a parallel operator that can express properties about communicating modules such as “every time a resource p is available for a process I, then it will be eventually available for all the processes in parallel with I” (in formulas, 2( p →  3a p)). We show that model-checking CRSM against ConCaRet is decidable. Our approach is based on automata-theoretic techniques: given a CRSM S and a formula ϕ, we construct a Büchi CRSM S¬ϕ (i.e., a CRSM equipped with generalized Büchi acceptance conditions) such that model checking S against ϕ is reduced to check the emptiness of the Büchi CRSM S¬ϕ . We solve this last problem by a non-trivial reduction to the emptiness problem for a straightforward variant of classical Büchi Tree Automata. Our construction of S¬ϕ extends the construction given in [2] for a RSM and a CaRet formula. Overall, our model checking algorithm runs in time exponential in both the maximal number ρ of modules that can invoked simultaneously and the size of the formula, and thus matches the known lower bound for deciding CaRet model checking. Therefore, we prove that the model-checking problem of CRSM with respect to ConCaRet Verification of Well-Formed Communicating Recursive State Machines 415 specifications is Exptime-complete. The main difference w.r.t. RSM is the time complexity in the size of the model that for RSM is polynomial, while for CRSM, it is exponential in ρ. Due to the lack of space, for the omitted details we refer the reader to [18]. 2 Well-Formed Communicating Recursive State Machines In this section we define syntax and semantics of well-formed Communicating Recursive State Machines (CRSM, for short). Syntax. A CRSM is an ordered collection of finite–state machines (FSM) augmented with the ability of refining a state with a collection of FSM (working in parallel) in a potentially recursive manner. Definition 1. A CRSM S over a set of propositions AP is a tuple (S1 , . . . , Sk ), start, where for 1 ≤ i ≤ k, Si = Σi , Σis , Ni , Bi , Yi , Eni , Exi , δi , ηi  is a module k and start ⊆ i=1 Ni is a set of start nodes. Each module Si is defined as follows: – Σi is a finite alphabet and Σis ⊆ Σi is the set of synchronization symbols; – Ni is a finite set of nodes and Bi is a finite set of boxes (with Ni ∩ Bi = ∅); – Yi : Bi → {1, . . . , k}+ is the refinement function which associates with every box a sequence of module indexes; – Eni ⊆ Ni (resp., Exi ⊆ Ni ) is a set of entry nodes (resp., exit nodes); – δi : (Ni ∪Retnsi )×Σi → 2Ni ∪Callsi is the transition function, where Callsi = {(b, e1 , . . . , em ) | b ∈ Bi , ej ∈ Enhj f or any 1 ≤ j ≤ m, and Yi (b) = h1 . . . hm } denotes the set of calls and Retnsi = {(b, x1 , . . . , xm ) | b ∈ Bi , xj ∈ Exhj f or any 1 ≤ j ≤ m, and Yi (b) = h1 . . . hm } denotes the set of returns of Si ; we assume w.l.o.g. that exits have no outgoing transitions, and entries have no incoming transitions, and Eni ∩ Exi = ∅; – ηi : Vi → 2AP is the labelling function, with Vi = Ni ∪ Callsi ∪ Retnsi (Vi is the set of vertices). i=k We assume that (Vi ∪ Bi ) ∩ (Vj ∪ Bj ) = ∅ for i = j. Also, let Σ = i=1 Σi ,  i=k   s V , Calls = i=k Callsi , Retns = i=k Σ s = i=k i=1 Σi , V = i=1 Retnsi , i=k i=ki=1 i i=k i=1 i=k N = i=1 Ni , B = i=1 Bi , En = i=1 Eni , and Ex = i=1 Exi . Functions η : V → 2AP , Y : B → {1, . . . , k}+ , and δ : (N ∪ Retns) × Σ → 2N ∪Calls are defined as the natural extensions of functions ηi , Yi and δi (with 1 ≤ i ≤ k). The set of states of a module is partitioned into a set of nodes and a set of boxes. Performing a transition to a box b can be interpreted as a (parallel) procedure call (fork ) which simultaneously activates a collection of modules (the list of modules given by the refinement function Y applied to b). Since Y gives a list of module indexes, a fork can activate different instances of the same module. Note that a transition leading to a box specifies the entry node (initial state) of each activated module. All the module instances, which are simultaneously activated in a call, run in parallel, whereas the calling module instance suspends its activity waiting for the return of the (parallel) procedure call. The return 416 L. Bozzelli, S. La Torre, and A. Peron of a parallel procedure call to a box b is represented by a transition from b that specifies an exit node (exiting state) for each module copy activated by the procedural call (a synchronous return or join from all the activated modules). Figure 1 depicts a simple CRSM consisting of two modules S1 and S2 . Module S1 has two entry nodes u1 and u2 , an exit node u4 , an internal node u3 and one box b1 that is mapped to the parallel composition of two copies of S2 . The module S2 has an entry node w1 , an exit node w2 , and two boxes b2 and b3 both mapped to one copy of S1 . The transition from node u1 to box b1 in S1 is represented by a fork transition having u1 as source and the entry nodes of the two copies of S2 as targets. Similarly, the transition from box b1 to node u4 is represented by a join transition having the exit nodes of the two copies of S2 as sources and u4 as target. In our model, the communication is allowed only between module instances that are activated on the same fork and are not busy in a (parallel) procedure call. As for the communicating (finite-state) hierarchical state machines [3], the form of communication we allow is synchronous and maximal in the sense that if a component (module) takes a transition labelled by a synchronization symbol σ, then each other parallel component which has σ in its synchronization alphabet must be able to take a transition labelled by σ. For instance, assuming that the symbols σ1 and σ2 in Figure 1 are synchronization symbols, the two copies of S2 which refine box b1 either both take the transition labelled by σ1 activating a copy of S1 with start node u1 or both take the transition labelled by σ2 activating a copy of S1 with start node u2 . Transitions labelled by symbols in Σ \ Σ s are instead performed independently without any synchronization requirement. A synchronization-free CRSM is a CRSM in which Σ s = ∅. The rank of S, written rank(S), is the maximum of {|Y (b)| | b ∈ B}. Note that if rank(S) = 1, then S is a Recursive State Machine (RSM) as defined in [1]. S1 u1 v -r H j H u2 v - S2 b1 : S2 S2 r  r r H H j > -f r r σ1 r b 3 : S1 Z  w1 v w2 u4 ~v Z - v @ >   r σ2@ r R r b 2 : S1  u3 Fig. 1. A sample CRSM Semantics. We give some notation first. A tree t is a prefix closed subset of N∗ such that if y · i ∈ t, then y · j ∈ t for any 0 ≤ j < i. The empty word ε is the root of t. The set of leaves of t is leaves(t) = {y ∈ t | y · 0 ∈ / t}. For y ∈ t, the set of children of y (in t) is children(t, y) = {y · i ∈ t} and the set of siblings of y (in t) is siblings(t, y) = {y} ∪ {y  · i ∈ t | y = y  · j for some j ∈ N}. For y, y  ∈ N∗ , we write y ≺ y  to mean that y is a proper prefix of y  . Verification of Well-Formed Communicating Recursive State Machines 417 The semantics of a CRSM S is defined in terms of a Labelled Transition System KS = Q, R. Q is the set of (global) states which correspond to activation hierarchies of instances of modules, and are represented by finite trees whose locations are labelled with vertices and boxes of the CRSM . Leaves correspond to active modules and a path in the tree leading to a leave y (excluding y) corresponds to the local call stack of the module instance associated with y. Formally, a state is a pair of the form (t, D) where t is a (finite) tree and D : t → B ∪ V is defined as follows: – if y ∈ leaves(t), then D(y) ∈ V (i.e. a vertex of S); – if y ∈ t \ leaves(t) and children(t, y) = {y · 0, . . . , y · m}, then D(y) = b ∈ B, Y (b) = h0 . . . hm , and D(y · j) ∈ Bhj ∪ Vhj for any j = 0, . . . , m. ∗ ∗ The global transition relation R ⊆ Q × (2N × 2N ) × Q is a set of tuples of the form (t, D), (,  ), (t , D ) where (t, D) (resp., (t , D )) is the source (resp., target) of the transition,  keeps track of the elements of t corresponding to the (instances of) modules of S performing the transition, and: for an internal move  = , for a call,  points to the modules activated by the (parallel) procedure call, and for a return from a call,  points to the reactivated caller module. Formally, (t, D), (,  ), (t , D ) ∈ R iff one of the following holds: Single internal move: t = t , there is y ∈ leaves(t) and σ ∈ Σ \ Σ s such that  =  = {y}, D (y) ∈ δ(D(y), σ), and D (z) = D(z) for any z ∈ t \ {y}. Synchronous internal move: t = t, there are y ∈ t with siblings(t, y) = {y1 , . . . , ym }, σ ∈ Σ s , and indexes k1 , . . . , kp ∈ {1, . . . , m} such that the following holds:  =  = {yk1 , . . . , ykp } ⊆ leaves(t), D (z) = D(z) for any z ∈ t \ {yk1 , . . . , ykp }, D (ykj ) ∈ δ(D(ykj ), σ) for any 1 ≤ j ≤ p, and for any j ∈ {1, . . . , m} \ {k1 , . . . , kp }, σ is not a synchronization symbol of the module associated with D(yj ). Module call: there is y ∈ leaves(t) such that D(y) = (b, e0 , . . . , em ) ∈ Call, t = t ∪ {y · 0, . . . , y · m},  = {y},  = {y · 0, . . . , y · m}, D (z) = D(z) for any z ∈ t \ {y}, D (y) = b, and D (y · j) = ej for any 0 ≤ j ≤ m. Return from a call: there is y ∈ t \ leaves(t) such that  = children(t, y) = {y · 0, . . . , y · m} ⊆ leaves(t),  = {y}, (D(y), D(y · 0), . . . , D(y · m)) ∈ Retns, t = t \ {y · 0, . . . , y · m}, D (z) = D(z) for any z ∈ t \ {y}, and D (y) = (D(y), D(y · 0), . . . , D(y · m)). For v ∈ V , we denote with v the global state ({ε}, D) where D(ε) = v. A run of S is an infinite path in KS from a state of the form v. 2.1 Local Successors Since we are interested in the local transformation of module instances, as in [2], we introduce different notions of local successor of module instances along a run. (0 , ) (1 , ) We fix a CRSM S and a run of S π = q0 −→0 q1 −→1 q2 . . . with qi = (ti , Di ) for any i. We denote by Qπ the set {(i, y) | y ∈ leaves(ti )}. An element (i, y) of Qπ , called local state of π, represents an instance of a module that at state qi is 418 L. Bozzelli, S. La Torre, and A. Peron active and in the vertex Di (y). Note that the set {(i, y  ) | y  ≺ y} represents the (local) stack of this instance. Now, we define two notions of next local state (w.r.t. a run). For (i, y) ∈ Qπ , nextπ (i, y) gives the set of module instances, called linear successors of (i, y), that are obtained by the first transition affecting (i, y). Note that such a transition may occur at j ≥ i or may not occur at all. In the former case nextπ (i, y) is a singleton unless Di (y) ∈ Calls, and then the linear successors correspond to the entry nodes of the called modules. Formally, if {m ≥ i | y ∈ m } = ∅ then nextπ (i, y) = ∅, otherwise nextπ (i, y) is given by: {(h + 1, y  ) | h = min{m ≥ i | y ∈ m }, y  ∈ h , and either y   yory  y  }. Note that if rank(S) = 1 (i.e. S is a RSM), then the linear successor corresponds to the (standard) global successor. For each (i, y) ∈ Qπ , we also give a notion of abstract successor, denoted nextaπ (i, y). If (i, y) corresponds to a call that returns, i.e., Di (y) ∈ Calls and there is a j > i such that y ∈ leaves(tj ) and Dj (y) ∈ Retns, then nextaπ (i, y) = (h, y) where h is the smallest of such j (the local state (h, y) corresponds to the matching return). For internal moves, the abstract successor coincides with the (unique) linear successor, i.e., if nextπ (i, y) = {(j, y)}, then nextaπ (i, y) = (j, y) (note that in this case Di (y) ∈ Retns ∪ N \ Ex). In all the other cases, the abstract successor is not defined and we denote this with nextaπ (i, y) = ⊥. The abstract successor captures the local computations inside a module A skipping over invocations of other modules called from A. Besides linear and abstract successor, we also define a caller of a local state (i, y) as the ‘innermost call’ that has activated (i, y). Formally, the caller of (i, y) (if any), written next− π (i, y), is defined as follows (notice that only local states of the form (i, ε) have no callers): if y = y  · m for some m ∈ N, then    next− π (i, y) = (j, y ) where j is the maximum of {h < i | y ∈ th and Dh (y ) ∈ − − Calls}; otherwise, nextπ (i, y) is undefined, written nextπ (i, y) = ⊥. The above defined notions allow us to define sequences of local moves (i.e. moves affecting local states) in a run. For (i, y) ∈ Qπ , the set of linear paths of π starting from (i, y) is the set of (finite or infinite) sequences of local states r = (j0 , y0 )(j1 , y1 ) . . . such that (j0 , y0 ) = (i, y), (jh+1 , yh+1 ) ∈ nextπ (jh , yh ) for any h, and either r is infinite or leads to a local state (jp , yp ) such that nextπ (jp , yp ) = ∅. Analogously, the notion of abstract path (resp. caller path) of π starting from (i, y) can be defined by using in the above definition the abstract successor (resp. caller) instead of the linear successor. Note that a caller path is always finite and uniquely determines the content of the call stack locally to the instance of the module active at (i, y). For module instances involved in a call, i.e., corresponding to pairs (i, y) such that y ∈ ti \ leaves(ti ), we denote the local state (if any) at which the call pending at (i, y) will return by returnπ (i, y). Formally, if y ∈ leaves(tj ) for some j > i, then returnπ (i, y) = {(h, y)} where h is the smallest of such j, otherwise returnπ (i, y) = ⊥. Also, we denote the local state corresponding to the call activating the module instance at (i, y) by callπ (i, y). Formally, callπ (i, y) := (h, y) where h is the maximum of {j < i | y ∈ tj and Dj (y) ∈ Calls}. Verification of Well-Formed Communicating Recursive State Machines 419 Let Evolve be the predicate over sets of vertices and boxes defined as follows: Evolve({v1 , . . . , vm }) holds iff either (1) there are σ ∈ Σ \ Σ s and 1 ≤ i ≤ m such that vi ∈ N ∪ Retns and δ(vi , σ) = ∅ (single internal move), or (2) there is σ ∈ Σ s such that the set H = {1 ≤ i ≤ m | vi ∈ N ∪ Retns and δ(vi , σ) = ∅} is not empty and for each i ∈ {1, . . . , m} \ H, σ does not belong to the synchronization alphabet of the module associated with vi (synchronized internal move). In the following, we focus on maximal runs of CRSM. Intuitively, a maximal run represents an infinite computation in which each set of module instances activated by the same parallel call that may evolve (independently or by a synchronous internal move) is guaranteed to make progress. Formally, a run π is maximal if for all (i, y) ∈ Qπ , the following holds: – if Di (y) ∈ Calls, then nextπ (i, y) = ∅ (a possible module call must occur); – if y = ε and Di (y  ) ∈ Ex for all y  ∈ siblings(ti, y), then nextπ (i, y) = ∅ (i.e., if a return from a module call is possible, then it must occur); – if y = ε, siblings(ti, y) = {y0 , . . . , ym } and for each 0 ≤ j ≤ m, nextπ (i, yj ) = ∅ if (i, yj ) ∈ Qπ and returnπ (i, yj ) = ⊥ otherwise, then the condition Evolve({Di (y0 ), . . . , Di (ym )}) does not hold. 3 The Temporal Logic ConCaRet Let AP be a finite set of atomic propositions. The logic ConCaRet over AP is the set of formulas inductively defined as follows: ϕ ::= p | call | ret | int | ¬ϕ | ϕ ∨ ϕ | b ϕ | ϕ U b ϕ | ϕ where b ∈ {∃, ∀, a, −} and p ∈ AP . A formula is interpreted over runs of an CRSM S = (S1 , . . . , Sk ), start. Let (0 , ) (1 , ) π = q0 −→0 q1 −→1 . . . be a run of S where qi = (ti , Di ) for i ≥ 0. The truth value of a formula w.r.t. a local state (i, y) of π is defined as follows: – (i, y) |=π p iff p ∈ η(Di (y)) (where p ∈ AP ); – (i, y) |=π call (resp. ret, int) iff Di (y) ∈ Calls (resp. Di (y) ∈ Retns, Di (y) ∈ N ); – (i, y) |=π ¬ϕ iff it is not the case that (i, y) |=π ϕ; – (i, y) |=π ϕ1 ∨ ϕ2 iff either (i, y) |=π ϕ1 or (i, y) |=π ϕ2 ; – (i, y) |=π b ϕ (with b ∈ {a, −}) iff nextbπ (i, y) = ⊥ and nextbπ (i, y) |=π ϕ; – (i, y) |=π ∃ ϕ iff there is (j, y  ) ∈ nextπ (i, y) such that (j, y  ) |=π ϕ; – (i, y) |=π ∀ ϕ iff for all (j, y  ) ∈ nextπ (i, y), (j, y  ) |=π ϕ; – (i, y) |=π ϕ1 U a ϕ2 (resp. ϕ1 U − ϕ2 ) iff given the abstract (resp. caller ) path (j0 , y0 )(j1 , y1 ) . . . starting from (i, y), there is h ≥ 0 such that (jh , yh ) |=π ϕ2 and for all 0 ≤ p ≤ h − 1, (jp , yp ) |=π ϕ1 ; – (i, y) |=π ϕ1 U ∃ ϕ2 (resp. ϕ1 U ∀ ϕ2 ) iff for some linear path (resp., for all linear paths) (j0 , y0 )(j1 , y1 ) . . . starting from (i, y), there is h ≥ 0 such that (jh , yh ) |=π ϕ2 and for all 0 ≤ p ≤ h − 1, (jp , yp ) |=π ϕ1 ; – (i, y) |=π ϕ iff for all y  ∈ siblings(th, y)\{y} with h = min{j ≤ i | (j, y) ∈ Qπ and nextπ (j, y) = nextπ (i, y)}, it holds that: 420 L. Bozzelli, S. La Torre, and A. Peron − if Dh (y  ) ∈ V , then (h, y  ) |=π ϕ; − if Dh (y  ) ∈ B, then callπ (h, y  ) |=π ϕ. We say the run π satisfies a formula ϕ, written π |= ϕ, if (0, ε) |=π ϕ (recall that t0 = {ε}). Moreover, we say S satisfies ϕ, written S |= ϕ, iff for any u ∈ start and for any maximal run π of S starting from u, it holds that π |= ϕ. Now, we can define the model-checking question we are interested in: Model checking problem: Given a CRSM S and a ConCaRet formula ϕ, does S |= ϕ? For each type of local successor (forward or backward), the logic provides the corresponding versions of the usual (global) next operator  and until operator U. For instance, formula − ϕ demands that the caller of the current local state satisfies ϕ, while ϕ1 U − ϕ2 demands that the caller path (that is always finite) from the current local state satisfies ϕ1 U ϕ2 . Moreover, the linear modalities are branching-time since they quantify over the possible linear successors of the current local state. Thus, we have both existential and universal linear versions of the standard modalities  and U. Finally, the operator  is a new modality introduced to express properties of parallel modules. The formula  φ holds at a local state (i, y) of a module instance I iff, being h ≤ i the time when vertex Di (y) was first entered and such that I has been idle from h to i, any module instance (different from I) in parallel with I and not busy in a module call (at time h) satisfies ϕ at time h, and any module instance in parallel with I and busy in a module call (at time h) satisfies ϕ at the call time. Note that the semantic of the parallel operator ensures the following desirable property for the logic ConCaRet: for each pair of local states (i, y) and (j, y) such that i < j and nextπ (j, y) = nextπ (i, y) (i.e., associated with a module instance which remains idle from i to j), the set of formulas that hold at (i, y) coincides with the set of formulas that hold at (j, y). We conclude this section illustrating some interesting properties which can be expressed in ConCaRet. In the following, as in standard LTL, we will use 3b ϕ as an abbreviation for true U b ϕ, for b ∈ {∃, ∀, a, −}. Further, for b ∈ {a, −}, let 2b ϕ stand for ¬3b ¬ϕ, 2∀ ϕ stand for ¬3∃ ¬ϕ, and 2∃ ϕ stand for ¬3∀ ¬ϕ. Besides the stack inspection properties and pre/post conditions of local computations of a module as in CaRet [2], in ConCaRet, we can express pre- and post- conditions for multiple threads activated in parallel. For instance, we can require that whenever module A and module B are both activated in parallel and pre-condition p holds, then A and B need to terminate, and post-condition q is satisfied upon the synchronous return (parallel total correctness). Assuming that parallel-calls to the modules A and B are characterized  by the proposition pA,B, this requirement can be expressed by the formula 2∀ (call ∧ p ∧ pA,B ) → a q . Linear modalities refer to the branching-time structure of CRSM computations. They can be used, for instance, to express invariance properties of the kind “every time a call occurs, then each activated module has to satisfy property ϕ”. Such a property can be written as 2∀ (call → ∀ ϕ). We can also express simple global Verification of Well-Formed Communicating Recursive State Machines 421 eventually properties of the kind “every time the computation starts from module A, then module B eventually will be activated”, expressed by the formula tA → 3∃ tB . However, we can express more interesting global properties such as recurrence requirements. For instance, formula 3∀ 2∀ (call → ∀ ¬tA ) asserts that module A is activated a finite number of times. Therefore, the negation of this formula requires an interesting global recurrence property: along any maximal infinite computation module A is activated infinitely many times. The parallel modality can  express alignment properties among parallel threads. For instance, formula 2∀ call → ∃ (3a (φ ∧  φ)) requires that when a parallel-call occurs, there must be an instant in the future such that the same property φ holds in all the parallel threads activated by the parallel-call. In particular, with such formula we could require the existence of a future time at which all threads activated by the call will be ready for a maximal synchronization on a symbol. More generally, with the parallel operator we can express reactivity properties of modules, namely the ability of a module to continuously interact with its parallel components. We can also express mutual exclusion properties: among the modules activated in a same procedure call, at most a module can access a shared resource p (in formulas, 2∀ (p → ¬p)). 4 Büchi CRSM In this section, we extend CRSM with acceptance conditions and address the emptiness problem for the resulting class of machines (i.e., the problem about the existence of an accepting maximal run from a start node). Besides standard acceptance conditions on the finite linear paths of a maximal run π, we require a synchronized acceptance condition on modules running in parallel, and a generalized Büchi acceptance condition on the infinite linear paths of π. We call this model a Büchi CRSM (B-CRSM for short). Formally, a B-CRSM S = (S1 , . . . , Sk ), start, F , Pf , Psync  consists of a CRSM (S1 , . . . , Sk ), start together with the following acceptance conditions: – F = {F1 , . . . , Fn } is a family of accepting sets of vertices of S; – PF is the set of terminal vertices; – Psync is a predicate defined over pairs (v, H) such that v is a vertex and H is a set of vertices such that |H| ≤ rank(S). (0 , ) (1 , ) Let π = q0 −→0 q1 −→1 q2 . . . be a run of S with qi = (ti , Di ) for i ≥ 0. For each i ≥ 0 and y ∈ ti , we denote by v(i, y) the vertex of S defined as follows: if (i, y) ∈ Qπ (i.e., (i, y) is a local state), then v(i, y) := Di (y); otherwise, v(i, y) := Dh (y) where (h, y) := callπ (i, y). We say that the run π is accepting iff the following three conditions are satisfied: 1. for any infinite linear path r = (i0 , y0 )(i1 , y1 ) . . . of π and F ∈ F, there are infinitely many h ∈ N such that Dih (yh ) ∈ F (generalized Büchi acceptance); 2. for any local state (i, y) ∈ Qπ such that nextπ (i, y) = ∅, condition Di (y) ∈ PF holds (terminal acceptance); 422 L. Bozzelli, S. La Torre, and A. Peron 3. for any i ≥ 0 and y ∈ i , Psync (v(i + 1, y), {v(i + 1, y1 ), . . . , v(i + 1, ymi )}) holds,where {y1 , . . . , ymi } is siblings(ti+1 , y) \ {y} (synchronized acceptance) We say the run π is monotone iff for all i ≥ 0, qi+1 is obtained from qi either by a module call or by an internal move. Note that in a monotone path either the tree ti+1 is equal to ti (for an internal move) or it is obtained from ti by adding some children to a leaf (for a module call). We decide the emptiness problem for B-CRSM in two main steps: 1. First, we give an algorithm to decide the problem about the existence of accepting monotone maximal runs starting from a given vertex; 2. Then, we reduce the emptiness problem to the problem addressed in Step 1. 4.1 Deciding the Existence of Accepting Monotone Maximal Runs We show how to decide the existence of accepting monotone maximal runs in B-CRSM by a reduction to the emptiness problem for Invariant Büchi tree automata. These differ from the standard formalism of Büchi tree automata [15] for a partitioning of states into invariant and non-invariant states, with the constraint that transitions from non-invariant to invariant states are forbidden. Also, the standard Büchi acceptance condition is strengthened by requiring in addition that an accepting run must have a path consisting of invariant states only. Formally, an (alphabet free) invariant Büchi tree automaton is a tuple U = D, P, P0 , M, F, Inv, where D ⊂ N \ {0} is a finite set of branching degrees, P is ∗ the finite set of states, P0 ⊆ P is the set of initial states, M : P × D → 2P is the d transition function with M (s, d) ∈ 2P , for all (s, d) ∈ P ×D, F ⊆ P is the Büchi condition, and Inv ⊆ P is the invariance condition. Also, for any s ∈ P \Inv and d ∈ D, we require that if s occurs in M (s, d), then s ∈ P \Inv. A complete D-tree is an infinite tree t ⊆ N∗ such that for any y ∈ t, the cardinality of children(t, y) belongs to D. A path of t is a maximal subset of t linearly ordered by ≺. A run of U is a pair (t, r) where t is a complete D-tree, r : t → P is a P -labelling of t such that r(ε) ∈ P0 and for all y ∈ t, (r(y · 0), r(y · 1), . . . , r(y · d)) ∈ M (r(y), d + 1), where d + 1 = |children(t, y)|. The run (t, r) is accepting iff: (1) there is a path ν of t such that for every y ∈ ν, r(y) ∈ Inv, and (2) for any path ν of t, the set {y ∈ ν | r(y) ∈ F } is infinite. The algorithm in [16] for checking emptiness in Büchi tree automata can be easily extended to handle also the invariance condition, thus we obtain the following. Proposition 1. The emptiness problem for invariant Büchi tree automata is logspace-complete for Ptime and can be decided in quadratic time. In the following, we fix a B-CRSM S = (S1 , . . . , Sk ), start, F , PF , Psync . Remark 1. Apart from a preliminary step computable in linear time (in the size of S), we can restrict ourselves to consider only accepting monotone maximal runs π of S starting from call vertices. In fact, if π starts at a non-call vertex v of a module Sh , then either π stays within Sh forever, or π enters a call v  of Sh Verification of Well-Formed Communicating Recursive State Machines 423 that never returns. In the first case, one has to check the existence of an accepting run in the generalized Büchi (word) automaton given by Ah = Vh , δh , Fh , where Fh is the restriction of F to the set Vh . This can be done in linear time [15]. In the second case, one has to check that there is a call v  reachable from v in Ah , and then that there is an accepting monotone maximal run in S from v  . Now, we construct an invariant Büchi tree automaton U capturing the monotone accepting maximal runs of S starting from calls. The idea is to model a monotone run π of S as an infinite tree (a run of U) where each path corresponds to a linear path of π. There are some technical issues to be handled. First, there can be finite linear paths. We use symbol  to capture terminal local states of π. Therefore, the subtree rooted at the node corresponding to a terminal local state is completely labelled by . Also, since we are interested in runs of S, we need to check that there is at least one infinite linear path in π. We do this using as invariant set the set of all U states except the state . Second, when a module call associated with a box b occurs, multiple module instances I1 , . . . , Im are activated and start running. We encode these local runs (corresponding to linear paths of π) on the same path of the run of U by using states of the form (b, v1 , . . . , vm , i1 , . . . , im , j), where v1 , . . . , vm are the current nodes or calls of each module, and i1 , . . . , im , j are finite counters used to check the fulfillment of the Büchi condition (see below). Since in monotone runs there are no returns from calls, when a module Ij (with 1 ≤ j ≤ m) moves to a call vertex v (by an internal move), we can separate the linear paths starting from v from the local runs associated with all modules I1 , . . . , Im except Ij . Therefore, in order to simulate an internal move (in the context of modules I1 , . . . , Im ), U nondeterministically splits in d + 1 copies for some 0 ≤ d ≤ m such that d copies correspond to those modules (among I1 , . . . , Im ) which move to call vertices, and  the d + 1-th copy goes to a state s of the form (b, v1 , . . . , vm , i1 , . . . , im , j  ) which describes the new status of modules I1 , . . . , Im . Note that in s, we continue to keep track of those modules which are busy in a parallel call. This is necessary for locally checking the fulfillment of the synchronized acceptance condition Psync . The Büchi condition F of S is captured with the Büchi condition of U along with the use of finite counters implemented in the states. For the ease of presentation, we assume that F consists of a single accepting set F . Then, we use states of the form (v, i) to model a call v, where the counter i is used to check that linear paths (in the simulated monotone run of S) containing infinite occurrences of calls satisfy the Büchi condition. In particular, it has default value 0 and is set to 1 if either v ∈ F or a vertex in F is visited in the portion of the linear path from the last call before entering v. In the second case, the needed information is kept in the counters ih of the states of the form (b, v1 , . . . , vm , i1 , . . . , im , j). Counter ih has default value 0 and is set to 1 if a node in F is entered in the local computation of the corresponding module. Counter j ∈ {0, . . . , m} is instead used to check that the Büchi condition of S is satisfied for linear paths corresponding to infinite local computations (without nested calls) of the modules refining the box b. Moreover, in order to check that a node vh corresponds to a terminal node (i.e., a node without linear successors in the simulated monotone run of 424 L. Bozzelli, S. La Torre, and A. Peron S), U can choose nondeterministically to set the corresponding counter ih to −1. Consistently, U will simulate only the internal moves from vertices v1 , . . . , vm in which the module instance associated with vh does not evolve. Thus, we have the following lemma. Lemma 1. For a call v, there is an accepting monotone maximal run of S from v iff there is an accepting run in U starting from (v, 0). When the Büchi condition consists of n > 1 accepting sets, the only changes in the above construction concern the counters: we need to check that each set is met and thus the 0 − 1 counters become counters up to n and the other counter is up to m · n. Therefore, denoting ρ = rank(S), nV the number of vertices of S and nδ the number of transitions of S, we have that the number of U states is 2 2ρ+2 O(ρ·nρ+1 ·nρ+1 ·nV ·(nV +nδ )ρ ). V ) and the number of U transitions is O(ρ ·n Thus, by Proposition 1, Remark 1, and Lemma 1 we obtain the following result. Lemma 2. The problem of checking the existence of accepting monotone maximal runs in a B-CRSM S can be decided in time O(ρ4 · n4ρ+4 · (nV + nδ )2ρ+2 ). 4.2 The Emptiness Problem for Büchi CRSM In this subsection, we show that the emptiness problem for Büchi CRSM can be reduced to check the existence of accepting monotone maximal runs. We fix a B-CRSM S = (S1 , . . . , Sk ), start, F , PF , Psync . Moreover, nV (resp., nδ ) denotes the number of vertices (resp., transitions) of S. Also, let ρ := rank(S). (0 , ) (0 , ) For F ⊆ V , a finite path of KS π = q0 −→0 q1 −→0 . . . qn (with qi = (ti , Di ) for any 0 ≤ i ≤ n, and t0 = {ε}) is F -accepting iff π satisfies the synchronized acceptance condition Psync and all the linear paths of π starting from the local state (0, ε) contain occurrences of local states (i, z) such that Di (z) ∈ F . For a box b ∈ B, we say π is a b-path if Di (ε) = b for all 1 ≤ i ≤ n − 1. We need the following preliminary result. Lemma 3 (Generalized Reachability Problem). Given F ⊆ V , the set of pairs (v, v  ) such that v = (b, e1 , . . . , em ) is a call, v  = (b, x1 , . . . , xm ) is a matching return, and there is an F -accepting b-path from v to v  , can be computed in time O(n2V · 4ρ · (nV + nδ )ρ ). Now, we show how to solve the emptiness problem for B-CRSM using the results stated by Lemmata 2 and 3. Starting from the B-CRSM S with F = {F1 , . . . , Fn }, we construct a new B-CRSM S  such that emptiness for S reduces to check the existence of accepting monotone maximal runs in S  . S  = (S1 , . . . , Sk ), start, F  , P  f , P  sync , with F  = {F1 , . . . , Fn }, is defined as follows. For 1 ≤ i ≤ k, Si is obtained extending the set of nodes and the transition function of Si as follows. For any call v = (b, e1 , . . . , em ) of Si and matching return v  = (b, x1 , . . . , xm ) such that there is a V -accepting b-path in S from v to v  , we add two new nodes ucnew and urnew , and the edge (ucnew , ⊥, urnew ), where ⊥ is a fresh non-synchronization symbol. We say ucnew Verification of Well-Formed Communicating Recursive State Machines 425 (resp., urnew ) is associated with the call v (resp., return v  ). Moreover, for any edge in Si of the form (u, σ, v) (resp., of the form (v  , σ, u)) we add in Si the edge (u, σ, ucnew ) (resp., the edge (urnew , σ, u)). Also, for 1 ≤ i ≤ n, if there is an Fi -accepting b-path from v to v  , then we add urnew to Fi (Fi also contains all elements of Fi ). Still, if v  ∈ Pf , then we add urnew to Pf (Pf also contains all elements of Pf ). Note that ucnew ∈ / Pf . In fact, if an accepting maximal run of S  has a local state labelled by ucnew , then the linear successor of this local state   is defined and is labelled by urnew . Finally, Psync (v0 , {v1 , . . . , vm }) (with m ≤ rank(S)) holds iff there are v0 , . . . , vm ∈ V such that Psync (v0 , {v1 , . . . , vm }) holds and for all 0 ≤ j ≤ m, either vj = vj , or vj is a return (resp., a call) and vj is a “new” node associated with it. Thus, we obtain the following result. Lemma 4. For any node u of S, there is an accepting maximal run in S from u iff there is an accepting monotone maximal run in S  from u. Note that the number of new nodes is bounded by 2n2V , the number of new edges is bounded by nV · nδ + n2V , and by Lemma 3, S  can be constructed in time O(|F | · n2V · 4ρ · (nV + nδ )ρ ). Thus, by Lemmata 2 and 4 we obtain the main result of this section. Theorem 1. Given a B-CRSM S, the problem  of checking the emptiness of S  can be decided in time O (|F | · (nV + nδ ) )O(ρ) . 5 Model Checking CRSM Against ConCaRet In this section, we solve the model-checking problem of CRSM against ConCaRet using an automata-theoretic approach: for a CRSM S and a ConCaRet formula ϕ, we construct a B-CRSM Sϕ which has an accepting maximal run iff S has a maximal run that satisfies ϕ. More precisely, an accepting maximal run of Sϕ corresponds to a maximal run π of S where each local state is equipped with the information concerning the set of subformulas of ϕ that hold at it along π. The construction proposed here follows and extends that given in [2] for CaRet. The extensions are due to the presence of the branching-time modalities and the parallel operator . For branching-time modalities we have to ensure that the existential (resp. universal) next requirements are met in some (in each) linear successor of the current local state. This is captured locally in the transitions of Sϕ . Parallel formulas are handled instead by the synchronization predicate. The generalized Büchi condition is used to guarantee the fulfillment of liveness requirements ϕ2 in until formulas of the form ϕ1 U b ϕ2 where b ∈ {a, ∃, ∀} (calleruntil formulas do not require such condition since a caller-path is always finite). For existential until formulas ϕ , when ϕ is asserted at a local state (i, y), we have to ensure that ϕ is satisfied in at least one of the linear paths from (i, y). In order to achieve this and ensure the acceptance of all infinite linear paths from (i, y) we use a fresh atomic proposition τϕ . For every vertex/edge in S, we have 2O(|ϕ|·rank(S)) vertices/edges in Sϕ . Also, the number of accepting sets in the generalized Büchi conditions is at most O(|ϕ|) and rank(Sϕ ) = rank(S). Since there is a maximal run of S satisfying formula ϕ 426 L. Bozzelli, S. La Torre, and A. Peron iff there is an accepting maximal run of Sϕ , model checking S against ϕ is reduced to check emptiness for the Büchi CRSM S¬ϕ . For rank(S) = 1, the considered problem coincides with the model checking problem of RSM against CaRet that is Exptime-complete (even for a fixed RSM). Therefore, by Theorem 1, we obtain the following result. Theorem 2. For a CRSM S and a formula ϕ of ConCaRet, the model checking problem for S against ϕ can be decided in time exponential in |ϕ|·(rank(S))2 . The problem is Exptime-complete (even when the CRSM is fixed). References 1. R. Alur, M. Benedikt, K. Etessami, P. Godefroid, T. Reps, and M. Yannakakis. Analysis of recursive state machines. To appear in ACM Transactions on Programming Languages and Systems, 2005. 2. R. Alur, K. Etessami, and P. Madhusudan. A Temporal Logic of Nested Calls and Returns. In Proc. of TACAS’04, pp. 467–481, 2004. 3. R. Alur, S. Kannan, and M. Yannakakis. Communicating hierarchical state machines. In Proc. of ICALP’99, LNCS 1644, pp. 169–178, 1999. 4. A. Bouajjani, J. Esparza, and O. Maler. Reachability Analysis of Pushdown Automata: Application to Model-Checking. In Proc. of CONCUR’97, LNCS 1243, pp. 135–150, 1997. 5. A. Bouajjani, J. Esparza, and T. Touili. A generic approach to the static analysis of concurrent programs with procedures. In Proc. of POPL’03, pp. 62–73, 2003. 6. A. Bouajjani and T. Touili. Reachability Analysis of Process Rewrite Systems. In Proc. of FSTTCS’03, LNCS 2914, pp. 74–87, 2003. 7. A. Bouajjani, M. Müller-Olm, and T. Touili. Regular Symbolic Analysis of Dynamic Networks of Pushdown Systems. In Proc. of CONCUR’05, LNCS 3653, pp. 473–487, 2005. 8. W.S. Brainerd. Tree generating regular systems. Information and Control, 14: pp. 217–231, 1969. 9. J. Esparza, and M. Nielsen, Decidability Issues for Petri Nets.In J. Inform. Process. Cybernet., EIK 30 (1994) 3, pp. 143–160. 10. J. Esparza and J. Knoop. An automata-theoretic approach to interprocedural parallel flow graphs. In Proc. of FoSSaCS’99, LNCS 1578, 1999. 11. J. Esparza and A. Podelski. Efficient Algorithms for pre∗ and post∗ on Interprocedural Parallel Flow Graphs. In Proc. of POPL’00, pp. 1–11, 2000. 12. C. Löding. Infinite Graphs Generated by Tree Rewriting. Doctoral thesis, RWTH Aachen, 2003. 13. R. Mayr. Process Rewrite Systems. In Information and Computation, Vol. 156, 2000, pp. 264–286. 14. S. Qadeer and J. Rehof. Context-bounded model checking of concurrent software. In Proc. of TACAS’05, LNCS 3440, pp. 93–107, , 2005. 15. W. Thomas. Automata on infinite objets. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, Vol. B, pp. 133–191, 1990. 16. M.Y. Vardi and P. Wolper. Automata-theoretic techniques for modal logics of programs. Journal of Computer and System Science, 32(2):183–221, 1986. 17. I. Walukiewicz. Pushdown processes: Games and Model Checking.In Int. Conf. on Computer Aided Verification, LNCS 1102, pages 62–74. Verlag, 1996. 18. URL: www.dia.unisa.it/professori/latorre/Papers/concaret.ps.gz What’s Decidable About Arrays? Aaron R. Bradley, Zohar Manna, and Henny B. Sipma Computer Science Department, Stanford University, Stanford, CA 94305-9045 {arbrad, zm, sipma}@theory.stanford.edu Abstract. Motivated by applications to program verification, we study a decision procedure for satisfiability in an expressive fragment of a theory of arrays, which is parameterized by the theories of the array elements. The decision procedure reduces satisfiability of a formula of the fragment to satisfiability of an equisatisfiable quantifier-free formula in the combined theory of equality with uninterpreted functions (EUF), Presburger arithmetic, and the element theories. This fragment allows a constrained use of universal quantification, so that one quantifier alternation is allowed, with some syntactic restrictions. It allows expressing, for example, that an assertion holds for all elements in a given index range, that two arrays are equal in a given range, or that an array is sorted. We demonstrate its expressiveness through applications to verification of sorting algorithms and parameterized systems. We also prove that satisfiability is undecidable for several natural extensions to the fragment. Finally, we describe our implementation in the πVC verifying compiler. 1 Introduction Software verification — whether via the classical Floyd-Hoare-style proof method with some automatic invariant generation, or through automatic methods like predicate abstraction — relies on the fundamental technology of decision procedures. Therefore, the properties of software that can be proved automatically are to a large extent limited by the expressiveness of the underlying fragments of theories for which satisfiability is decidable and can be checked efficiently. Arrays are a basic data structure of imperative programming languages. Theories for reasoning about the manipulation of arrays in programs have been studied intermittently for about as long as computer science has been a recognized field [5]. Nonetheless, the strongest predicate about arrays that appears in a decidable fragment is equality between two unbounded arrays [7]. For software verification, unbounded equality is not enough: for example, assertions such as that two subarrays are equal or that all elements of a subarray satisfy a certain property are not uncommon in normal programming tasks. We study a fragment  This research was supported in part by NSF grants CCR-01-21403, CCR-02-20134, CCR-02-09237, CNS-0411363, and CCF-0430102, by ARO grant DAAD19-01-10723, and by NAVY/ONR contract N00014-03-1-0939. The first author was additionally supported by a Sang Samuel Wang Stanford Graduate Fellowship. E.A. Emerson and K.S. Namjoshi (Eds.): VMCAI 2006, LNCS 3855, pp. 427–442, 2006. c Springer-Verlag Berlin Heidelberg 2006  428 A.R. Bradley, Z. Manna, and H.B. Sipma of a theory of arrays that allows expressing such properties and many others, and for which satisfiability is decidable. Various theories of arrays have been addressed in past work. Research in satisfiability decision procedures has focused on the quantifier-free fragments of array theories, as the full theories are undecidable (see Section 5). In our discussion, we use the sorts array, elem, and index for arrays, elements, and indices, respectively. The syntax a[i] represents an array read, while a{i ← e} represents the array with position i modified to e, for array a, elem e, and index i. McCarthy proposed the main axiom of arrays, read-over-write [5]: (∀ array a)(∀ elem e)(∀ index i, j)  i = j → a{i ← e}[j] = e ∧ i = j → a{i ← e}[j] = a[j] An extensional theory of arrays has been studied formally, most recently in [7] and [1]. The extensional theory relates equations between arrays and equations between their elements: (∀ array a, b)[(∀ index i) a[i] = b[i] → a = b] In [8], a decidable quantifier-free fragment of an array theory that allows a restricted use of a permutation predicate is studied. Their motivation, as with our work, is that verification of software requires decision procedures for expressive assertion languages. They use their decision procedure to prove that various sorting algorithms return a permutation of their input. In the conclusion of [8], they suggest that a predicate expressing the sortedness of arrays would be useful. The main theory of arrays that we study in this paper is motivated by practical requirements in software verification. We use Presburger arithmetic for our theory of indices, so the abstract sort index is concrete for us. Additionally, the theory is parameterized by the element theories used to describe the contents of arrays. Typical element theories include the theory of integers, the theory of reals, and the theory of equality. Our satisfiability decision procedure is for a fragment, which we call the array property fragment, that allows a constrained use of universal quantification. We characterize the fragment in Section 2, but for now we note that the decidable fragment is capable of expressing array equality, the usual equality in an extensional theory of arrays; bounded equality, equality between two subarrays; and various properties, like sortedness, of (sub)arrays. The satisfiability procedure reduces satisfiability of a formula of the array property fragment to satisfiability of a quantifier-free formula in the combined theory of equality with uninterpreted functions (EUF), Presburger arithmetic, and the element theories. The original formula is equisatisfiable to the reduced formula. For satisfiability, handling existential quantification is immediate. Universally quantified assertions are converted to finite conjunctions by instantiating the quantified index variables over a finite set of index terms. The main insight of the satisfiability decision procedure, then, is that for a given formula in the fragment, there is a finite set of index terms such that instantiating universally What’s Decidable About Arrays? 429 quantified index variables from only this set is sufficient for completeness (and, trivially, soundness). After presenting and analyzing this decision procedure, we study a theory of maps, which are like arrays except that indices are uninterpreted. Therefore, the decidable fragment of the theory is less powerful for reasoning about arrays; however, it is more expressive than, for example, the quantifier-free fragment of the extensional theory presented in [7]. In particular, it is expressive enough to reason about hashtables. The paper is organized as follows. Section 2 defines the theory and the fragment that we study. Section 3 describes the decision procedure for satisfiability of the fragment. In Section 4, we prove that the procedure is sound and complete. We also prove that when satisfiability for quantifier-free formulae of the combined theory of EUF, Presburger arithmetic, and array elements is in NP, then satisfiability for bounded fragments is NP-complete. In Section 5, we prove that several natural extensions to the fragment result in fragments for which satisfiability is undecidable; we identify one slightly larger fragment for which decidability remains open. Section 6 presents and analyzes a parametric theory of maps. Section 7 motivates the theories with several applications in software verification. We implemented the procedure in our verifying compiler πVC; we describe our experience and results in Section 7.4. 2 An Array Theory and Fragment We introduce the theory of arrays and the array property fragment for which satisfiability is decidable. Definition 1 (Theories). The theory of arrays uses Presburger arithmetic, TZ , 1 m for array indices, and the parameter element theories Telem , . . . , Telem , for m > 0, for its elements. The many-sorted array theory for the given element theories is {elem } called TA k k . We usually drop the superscript. Recall that the signature of Presburger arithmetic is ΣZ = {0, 1, +, −, =, <} . k k Assume each Telem has signature Σelem . TA then has signature ΣA = ΣZ ∪  k Σelem ∪ {·[·], ·{· ← ·}} k where the two new functions are read and write, respectively. The read a[i] returns the value stored at position i of a, while the write a{i ← e} is the array a modified so that position i has value e. For multidimensional arrays, we abbreviate a[i] · · · [j] with a[i, . . . , j]. The theory of equality with uninterpreted functions (EUF), TEUF , is used in the decision procedure. 430 A.R. Bradley, Z. Manna, and H.B. Sipma Definition 2 (Terms and Sorts). Index variables and terms have sort Z and are Presburger arithmetic terms. Element variables and terms have sort elemk , k for some element theory Telem . Array variables and terms have functional sorts constructed from the Z and elemk sorts: k – One-dimensional sort : Z → elemk , for some element theory Telem k – Multidimensional sort: Z → · · · → elemk , for some element theory Telem ; e.g., a two-dimensional array has sort Z → Z → elemk For element term e, both a and a{i ← e} are array terms; the latter term is a with position i modified to e. For array term a and index term i, a[i] is either an element term if a has sort Z → elemk , or an array term if a has a multidimensional sort; e.g., if a has sort Z → Z → elemk , then a[i, j] is an element term of sort elemk , while a[i] is an array term of sort Z → elemk . Definition 3 (Literal and Formula). A TA -literal is either a TZ -literal or a k Telem -literal; literals can contain array subterms. A formula ψ in TA is a quantified Boolean combination of TA -literals. Notationally, we say ψ[t] is the formula that contains subterm t. t ∈ ψ is true iff ψ contains subterm t. We study satisfiability for a fragment of TA that is a subset of the ∃∗ ∀∗Z -fragment of TA , where the subscript on ∀ indicates that the quantifier is only over index variables. We call this fragment the array property fragment. Definition 4 (Array Property). An array property is a formula of the form (∀i)(ϕI (i) → ϕV (i)) where i is a vector of index variables, and ϕI (i) and ϕV (i) are the index guard and the value constraint, respectively. The height of the property is the number of quantified index variables in the formula. The form of an index guard ϕI (i) is constrained according to the grammar iguard atom expr pexpr → → → → iguard ∧ iguard | iguard ∨ iguard | atom expr ≤ expr | expr = expr uvar | pexpr Z | Z · evar | pexpr + pexpr where uvar is any universally quantified variable, and evar is any existentially quantified integer variable. The form of a value constraint ϕV (i) is also constrained. Any occurrence of a quantified index variable i ∈ i in ϕV (i) must be as a read into an array, a[i], for array term a. Array reads may not be nested; e.g., a1 [a2 [i]] is not allowed. Definition 5 (Array Property Fragment). The array property fragment of TA consists of all existentially-closed Boolean combinations of array property formulae and quantifier-free TA -formulae. The height of a formula in the fragment is the maximum height of an array property subformula. What’s Decidable About Arrays? 431 Example 1 (Equality Predicates). Extensionality can be encoded in the array property fragment. We present = and bounded equality as defined predicates. In the satisfiability decision procedure, instances of defined predicates are expanded to their definitions in the first step. Equality. a = b: Arrays a and b are equal. (∀i)(a[i] = b[i]) Bounded Equality. beq(, u, a, b): Arrays a and b are equal in the interval [, u]. (∀i)( ≤ i ≤ u → a[i] = b[i]) Example 2 (Sorting Predicates). More specialized predicates can also be defined in the array property fragment. Consider the following predicates for specifying properties useful for reasoning about sortedness of integer arrays in the array {Z} property fragment of TA . Sorted. sorted(, u, a): Integer array a is sorted (nondecreasing) between elements  and u. (∀i, j)( ≤ i ≤ j ≤ u → a[i] ≤ a[j]) Partitioned. partitioned(1 , u1 , 2 , u2 , a): Integer array a is partitioned such that all elements in [1 , u1 ] are less than or equal to every element in [2 , u2 ]. (∀i, j)(1 ≤ i ≤ u1 < 2 ≤ j ≤ u2 → a[i] ≤ a[j]) The literal u1 < 2 can be expressed as u1 ≤ 2 − 1 so that the syntactic restrictions are met. Example 3 (Array Property Formula). The following formula is in the array {Z} property fragment of TA : (∃ array a)(∃w, x, y, z, k, , n ∈ Z)  w <x<y <z ∧ 0<k <<n ∧ −k >1 . ∧ sorted(0, n − 1, a{k ← w}{ ← x}) ∧ sorted(0, n − 1, a{k ← y}{ ← z}) 3 Decision Procedure SATA We now define the decision procedure SATA for satisfiability of formulae from the array property fragment. After removing array writes and skolemizing existentially quantified variables, SATA rewrites universally quantified subterms to finite conjunctions by instantiating the quantified variables over a set of index terms. The next definitions construct the set of index terms that is sufficient for making this procedure complete. 432 A.R. Bradley, Z. Manna, and H.B. Sipma Definition 6 (Read Set). The read set for formula ψ is the set def R = {t : ·[t] ∈ ψ} for t representing index terms that are not universally quantified index variables. Definition 7 (Bounds Set). The bounds set B for formula ψ is the set of Presburger arithmetic terms that arise as root pexprs (i.e., pexpr terms whose parent is an expr) during the parsing of all index guards in ψ, according to the grammar of Def. 4. The read set R is the set of index terms at which some array is read, while the bounds set B is the set of index terms that define boundaries on some array for an array property (e.g., the boundaries of an interval in which array elements are sorted). Definition 8 (Index Set). For a formula ψ, define  {0} if R = B = ∅ def Iψ = R ∪ B otherwise The procedure reduces the satisfiability of  array property formula ψ to the satk isfiability of a quantifier-free (TEUF ∪ TZ ∪ k Telem )-formula. Definition 9 (SATA ). 1. Replace instances of defined predicates with their definitions, and convert to negation normal form. 2. Apply the following rule exhaustively to remove writes: ψ[a{i ← e}] for fresh b ψ[b] ∧ b[i] = e ∧ (∀j)(j = i → a[j] = b[j]) (write) To meet the syntactic requirements on an index guard, we rewrite the third conjunct as (∀j)(j ≤ i − 1 ∨ i + 1 ≤ j → a[j] = b[j]) . 3. Apply the following rule exhaustively: ψ[(∃i)(ϕI (i) ∧ ¬ϕV (i)) for fresh j ψ[ϕI (j) ∧ ¬ϕV (j)] (exists) 4. Apply the following rule exhaustively, where Iψ3 is determined by the formula constructed in Step 3. ψ[(∀i)(ϕ I (i) → ϕV (i))] ⎤ ⎡ ⎢ ψ⎣ ⎥ (ϕI (i) → ϕV (i))⎦ n i∈Iψ 3 (forall) What’s Decidable About Arrays? 433 z y x w ↑ k ↑ ↑↑ Fig. 1. Unsorted arrays 5. Associate with each n-dimensional array variable a a fresh n-ary uninterpreted function fa , and replace each array read a[i, . . . , j] by fa (i, . . . , j). Decide this formula’s using a procedure for quantifier-free formulae of  satisfiability k TEUF ∪ TZ ∪ k Telem . Step 2 introduces new index terms (i − 1 and i + 1, above). Example 4 (New Indices). Consider again the array property formula w <x<y <z ∧ 0<k <<n ∧ −k >1 ∧ sorted(0, n − 1, a{k ← w}{ ← x}) ∧ sorted(0, n − 1, a{k ← y}{ ← z}) (which is existentially closed). The first step of SATA replaces the sorted literals with definitions; the second applies write to remove array writes. For readability, we write the index guards resulting from write using disequalities: w <x<y <z ∧ 0<k <<n ∧ −k >1 ∧ (∀i, j)(0 ≤ i ≤ j ≤ n − 1 → c[i] ≤ c[j]) ∧ (∀i, j)(0 ≤ i ≤ j ≤ n − 1 → e[i] ≤ e[j]) ∧ (∀i)(i =  → b[i] = c[i]) ∧ c[] = x ∧ (∀i)(i = k → a[i] = b[i]) ∧ b[k] = w ∧ (∀i)(i =  → d[i] = e[i]) ∧ e[] = z ∧ (∀i)(i = k → a[i] = d[i]) ∧ d[k] = y Then R = {k, }, B = {0, n − 1,  − 1,  + 1, k − 1, k + 1}, and Iψ = {0, n − 1, k − 1, k, k + 1,  − 1, ,  + 1}. Note that R and B do not include i or j, which are universally quantified, while B contains the terms produced by converting disequalities to disjunctions of inequalities. Applying forall to each array property subformula converts universal quantification to finite conjunction over Iψ . We have in particular that c[k + 1] ≤ c[] = x < y = d[k] ≤ d[k + 1] , yet c[k + 1] = b[k + 1] = a[k + 1] = d[k + 1], a contradiction. Thus, the original {Z} formula is TA -unsatisfiable. The index term k + 1 is essential for this proof. We visualize this situation in Figure 1. Arrows indicate positions represented by the new indices introduced in Step 2. Pictorially, for both modified versions of a to be sorted requires that the two parallel lines in Figure 1 be one line. To prove that sortedness is impossible requires considering elements in the interval between k and , not just elements at positions k and . 434 4 A.R. Bradley, Z. Manna, and H.B. Sipma Correctness We prove the soundness and completenessof SATA . Additionally, we show that if k satisfiability of quantifier-free (TEUF ∪TZ ∪ k Telem )-formulae is in NP, then satisfiability for each bounded fragment, in which all array properties have maximum height N , is NP-complete. We refer to the formula constructed in Step n of SATA by ψn ; e.g., ψ5 is the final quantifier-free formula constructed in Step 5. Lemma 1 (Complete). If ψ5 is satisfiable, then ψ is satisfiable. Proof. Suppose that I is an interpretation such that I |= ψ5 ; we construct an interpretation J such that J |= ψ. To this end, we define under I a projection operation, proj : Z → IψI 3 : proj(z) = tI such that t ∈ Iψ3 ; and either tI ≤ z and (∀s ∈ Iψ3 )(sI ≤ tI ∨ sI > z), or tI > z and (∀s ∈ Iψ3 )(sI ≥ tI ). That is, proj(z) is the nearest neighbor to z in tI , with preference for left neighbors. Extend proj to tuples of integers in the natural way: proj(z1 , . . . , zk ) = (proj(z1 ), . . . , proj(zk )). Equate all non-array variables in J and I; note that proj is now defined the same under I and J. For each k-dimensional array a of ψ, set aJ [z] = faJ (proj(z)). We now prove that J |= ψ. The manipulations in Steps 1, 3, and 5 are trivial. Step 2 implements the definition of array write, so that the resulting formula is equivalent to the original formula. Thus, we focus on Step 4. We prove that if J |= ψ4 , then J |= ψ3 . Suppose that rule forall is applied to convert ψb to ψa and that J |= ψa . Application of this rule is the main focus of the proof: we prove that J |= ψb . That is, we assume that ⎡ ⎤ ⎢ J |= ψ  ⎣ ⎥ (ϕI (i) → ϕV (i))⎦ n i∈Iψ  (1)  ψa and prove that   J |= ψ  (∀i)(ϕI (i) → ϕV (i)) .   (2) ψb Below, we prove that ⎡ ⎢ J |= ⎣ ⎤ ⎥ (ϕI (i) → ϕV (i)) → (∀i)(ϕI (i) → ϕV (i))⎦ , n i∈Iψ which implies (2) since ψ  is in negation normal form. Our proof takes the form ϕI (proj(z)) ϕV (proj(z)) J |= (A) ϕI (z) (B) ? ϕV (z) What’s Decidable About Arrays? 435 where, for arbitrary z ∈ Zn , we prove the implication labeled “?” by proving (A) and (B). The top implication follows from (1) and the definition of proj. For (A), consider the atoms of the index guard under J. If J ≤ mJ , then the definition of proj implies that proj(J ) ≤ proj(mJ ). At worst, it may be that J < mJ , while proj(J ) = proj(mJ ). For an equation, J = mJ iff proj(J ) = proj(mJ ). Then (A) follows by structural induction over the index guard, noting that the index guard is a positive Boolean combination of atoms. For (B), recall that arrays in J are constructed using proj. In particular, for any z, aJ [z] = aJ [proj(z)], so that (B) follows. Therefore, J |= ψb , and SATA is complete. Lemma 2 (Sound). If ψ is satisfiable, then ψ5 is satisfiable. Proof. An interpretation I satisfying ψ can be altered to J satisfying ψ5 by I I assigning faJ (i ) = aI [i ] for each array variable a and equating all else. Universal quantification is replaced by conjunction over a finite subset of all indices, thus weakening each (positive) literal.  k )-formulae Theorem 1. If satisfiability of quantifier-free (TEUF ∪ TZ ∪ k Telem is decidable, then SATA is a decision procedure for satisfiability in the array {elem } property fragment of TA k k . Theorem 2 (NP-Complete). If satisfiability of quantifier-free (TEUF ∪ TZ ∪  k T )-formulae is in NP, then for the subfragment of the array property fragk elem {elemk }k ment of TA in which all array property formulae have height at most N , satisfiability is NP-complete. Proof. NP-hardness, even when ψ is a conjunction of literals, follows by NPhardness of satisfiability of TZ [6]. Steps 1-3 increase the size of the formula by an amount linear in the size of ψ. The rule forall increases the size of formulae by an amount polynomial in the size of ψ and exponential in the maximum height N . For fixed N , the increase is thus polynomial in ψ. The proof requires only a polynomial number (in the size  of ψ) of applications of rules, so that the size of k )-formula is at most polynomially larger the quantifier-free (TEUF ∪ TZ ∪ k Telem than ψ. Inclusion in NP follows from the assumption of the theorem. 5 Undecidable Problems Theorem 1 states that for certain sets of element theories {elemk }k , SATA is {elem } a satisfiability decision procedure for the array property fragment of TA k k . The theory of reals, TR , in which variables range over R and with signature ΣR = {0, 1, +, −, =, <}, and the theory of integers, TZ , are such element theories. We now show that several natural extensions of the array property fragment {R} {Z} result in a fragment of TA or TA for which satisfiability is undecidable. We identify one extension for which decidability remains open. 436 A.R. Bradley, Z. Manna, and H.B. Sipma {R} {Z} Theorem 3. Satisfiability of the ∃∗ ∀Z ∃Z -fragment of both TA and TA is undecidable, even with syntactic restrictions like in the array property fragment. Proof. In [3], we prove that termination of loops of this form is undecidable: real x1 , . . . , xn θ: xi = ci i∈I⊆{1,...,n} while x1 ≥ 0 do choose τi : x := Ai x done ci are constant integers, ci ∈ Z, while each Ai is an n × n constant array of integers, Ai ∈ Zn×n . θ is the initial condition of the loop. Variables x1 , . . . , xn range over the reals, R. There are m > 0 transitions, {τ1 , . . . , τm }; on each iteration, one is selected nondeterministically to be taken. x is an Rn -vector representing the n variables {x1 , . . . , xn }; each transition thus updates all variables simultaneously by a linear transformation. We call loops of this form linear loops. Termination for similar loops in which all variables are declared as integers is also undecidable. We now prove by reduction from termination of linear loops that satisfiability of the ∃∗ ∀Z ∃Z -fragment is undecidable. That is, given linear loop L, we construct formula ϕ such that ϕ is unsatisfiable iff L always terminates. In other words, a model of ϕ encodes a nonterminating computation of L. For each loop variable xi , we introduce array variable xi . Let ρτ (s, t), for index terms s and t, encode transition τ : x := Ax as follows: n def xi [t] = Ai,1 · x1 [s] + · · · + Ai,n · xn [s] . ρτ (s, t) = i=1 Let g(s), for index term s, encode the guard x1 ≥ 0: def g(s) = x1 [s] ≥ 0 . Let θ(s), for index term s, encode the initial condition: def θ(s) = xi [s] = ci . i∈I⊆{1,...,n} Then form ϕ:  ϕ : (∃x1 , . . . , xn , z)(∀i)(∃j) θ(z) ∧ g(z) ∧   ρτk (i, j) ∧ g(j) . k Suppose ϕ is satisfiable. Then construct a nonterminating computation s0 s1 s2 . . . as follows. Let each variable xk of state s0 take on the value xk [z] What’s Decidable About Arrays? 437 of the satisfying model. For s1 , choose the j that corresponds to i = z and assign xk according to xk [j]. Continue forming the computation sequentially. Each state is guaranteed to satisfy the guard, so the computation is nonterminating. Suppose s0 s1 s2 . . . is a nonterminating computation of L. Then construct the following model for ϕ. Let z = 0; for each index i ≥ 0, set xk [−i] = xk [i] = xk of state si . Therefore, ϕ is unsatisfiable iff L always terminates, and thus satisfiability of the ∃∗ ∀Z ∃Z -fragment of TA is undecidable. Note that ϕ meets the syntactic restrictions of the array property fragment, except for the extra quantifier alternation. Theorem 4. Extending the array property fragment with any of – nested reads (e.g., a1 [a2 [i]], where i is universally quantified); – array reads by a universally quantified variable in the index guard; – general Presburger arithmetic expressions over universally quantified index variables (even just addition of 1, e.g., i + 1) in the index guard or in the value constraint {Z} results in a fragment of TA for which satisfiability is undecidable. {Z} Proof. In TA , the presence of nested reads allows skolemizing j in ϕ of the proof of Theorem 3:    ρτk (i, aj [i]) ∧ g(aj [i]) . (∃x1 , . . . , xn , z, aj )(∀i) θ(z) ∧ g(z) ∧ k Allowing array reads in the index guard enables flattening of nested reads through introduction of another universally quantified variable: ψ[ϕI → ϕV [a[a[i]]]] ⇒ ψ[ϕI ∧ j = a[i] → ϕV [a[j]]] . Allowing addition of 1 in the value constraint allows an encoding of termination similar to that in the proof of Theorem 3:    (∃x1 , . . . , xn , z)(∀i ≥ z) θ(z) ∧ g(z) ∧ ρτk (i, i + 1) ∧ g(i + 1) . k Finally, addition of 1 in the index guard can encode addition of 1 in the value constraint through introduction of another universally quantified variable: ψ[ϕI → ϕV [a[i + 1]]] ⇒ ψ[ϕI ∧ j = i + 1 → ϕV [a[j]]] . Theorem 3 implies that a negated array property cannot be embedded in the consequent of another array property. Theorem 4 states that loosening most syntactic restrictions results in a fragment for which satisfiability is undecidable. One extension remains for which decidability of satisfiability is an open problem: the fragment in which index guards can contain strict inequalities, < (equivalently, in which index guards can contain negations). In this fragment, one could express that an array has unique elements: (∀i, j)(i < j → a[i] = a[j]) . 438 6 A.R. Bradley, Z. Manna, and H.B. Sipma Maps We consider an array theory in which indices are uninterpreted. For clarity, we call indices keys in this theory, and call the arrays maps. {elem } Definition 10 (Map Theory). The parameterized map theory TM signature   ΣM = ΣEUF ∪ Σelem ∪ {·[·], ·{· ← ·}} . has  Key variables and terms are uninterpreted, with sort EUF. Element variables and terms have some sort elem . Map variables and terms have functional sorts constructed from the EUF and elem sorts; e.g., EUF → elem . Definition 11 (Map Property Fragment). A map property is a formula of the form (∀k)(ϕK (k) → ϕV (k)), where k is a vector of key variables, and ϕK (k) and ϕV (k) are the key guard and the value constraint, respectively. The height of the property is the number of quantified variables in the formula. The form of a key guard ϕK (k) is constrained according to the grammar kguard → kguard ∧ kguard | kguard ∨ kguard | atom atom → var = var | evar = var | var = evar var → evar | uvar where uvar is any universally quantified key variable, and evar is any existentially quantified variable. The form of a value constraint ϕV (k) is also constrained. Any occurrence of a quantified key variable k ∈ k in ϕV (k) must be as a read into a map, h[k], for map term h. Map reads may not be nested; e.g., h1 [h2 [k]] is not allowed. The map property fragment of TM consists of all existentially-closed Boolean combinations of map property formulae and quantifier-free TM -formulae. Definition 12 (Key Set). 2 For a formula ψ, define R = {t : ·[t] ∈ ψ}; B as the set of variables that arise as evars in the parsing of all key guards according to the grammar of Def. 11; and K = R ∪ B ∪ {κ}, where κ is a fresh variable. Definition 13 (SATM ). 1. Step 1 of SATA . 2. Apply the following rule exhaustively to remove writes: ψ[h{k ← e}] for fresh h ψ[h ] ∧ h [k] = e ∧ (∀j)(j = k → h[j] = h [j]) 3. Step 3 of SATA . (write) What’s Decidable About Arrays? 439 4. Apply the following rule exhaustively, where Kψ3 is determined by the formula constructed in Step 3. ψ[(∀k)(ϕ K (k) → ϕV (k))] ⎤ ⎡ ⎢ ψ⎣ k∈Kn ψ (forall) ⎥ (ϕK (k) → ϕV (k))⎦ 3 5. Construct ψ4 ∧ k = κ k∈K\{κ} 6. Step 5 of SAT  A , except that the resulting formula is decided using a procedure . for TEUF ∪  Telem   Theorem 5. If satisfiability of quantifier-free (TEUF ∪  Telem )-formulae is decidable, then SATM is a decision procedure for satisfiability in the map property {elem } fragment of TM   . The main idea of the proof, as in the proof of Theorem 1, is to define a projection I , for interpretation I. For object o of I, if o = tI operation, proj : EUF → Kψ 3 for some t ∈ Kψ3 , then proj(o) = o (= tI ); otherwise, proj(o) = κI . If proj is used to define J, as in the proof of Theorem 1, then proj preserves equations and disequalities in key guards and values of map reads in value constraints. The relevant undecidability results from Section 5 carry over to maps, with the appropriate modifications. 7 7.1 Applications, Implementation, and Results Verification of Sorting Algorithms Figure 2 presents an annotated version of InsertionSort in an imperative language, where the annotations specify that InsertionSort returns a sorted array. @pre, @post, and @ label preconditions, postconditions, and (loop) assertions, respectively. For variable x, x0 refers to its value upon entering a function; |a| maps array a to its length; rv is the value returned by a function. Each ver{Z} ification condition is expressible in the array property fragment of TA and is unsatisfiable, proving that InsertionSort returns a sorted array. 7.2 Verification of Parameterized Programs The parallel composition of an arbitrary number of copies of a process is often represented as a parameterized program. Variables for which one copy appears in each process are modeled as arrays. Thus, it is natural to specify and prove properties of parameterized programs with a language of arrays. 440 A.R. Bradley, Z. Manna, and H.B. Sipma @pre  @post sorted(0, |rv| − 1, rv) int[] InsertionSort(int[] a) { int i, j, t; for (i := 1; i < |a|; i := i + 1) @(1 ≤ i ∧ sorted(0, i − 1, a) ∧ |a| = |a0 |) { t := a[i]; for (j := i − 1; j ≥ 0 ∧ a[j] > t; j := j − 1)  1 ≤ i < |a| ∧ −1 ≤ j ≤ i − 1  @ ∧ sorted(0, i − 1, a) ∧ |a| = |a0 | ∧ (j < i − 1 → (a[i − 1] ≤ a[i] ∧ (∀k ∈ [j + 1, i]) a[k] > t)) a[j + 1] := a[j]; a[j + 1] := t; } return a; } Fig. 2. InsertionSort int[] y := int[0..M − 1]; θ: y[0] = 1 ∧ (∀j ∈ [1, M − 1]) y[j] = 0      ||  i∈[0,M −1]    request(y, i); while (true) @((∀j ∈ [0, M − 1]) y[j] = 0 ∧ i = i0 ∧ |y| = |y0 |) { critical; release(y, i ⊕M 1); noncritical; request(y, i); }           Fig. 3. Sem-N Figure 3 presents a simple semaphore-based algorithm for mutual exclusion among M processes [4]. The semantics of request and release are the usual ones: request(y, i) : y[i] > 0 ∧ y  = y{i ← y[i] − 1} release(y, i) : y  = y{i ← y[i] + 1} Mutual exclusion at the critical section is implied by the invariant (∀j ∈ [0, M − 1]) y[j] = 0, which appears as part of the loop invariant. The mutual exclusion property is verified using the array decision procedure. 7.3 A Decision Procedure for Hashtables We show how to encode an assertion language for hashtables, with parameter {elem } 1 m theories Telem , . . . , Telem for values, into TM   . Hashtables have the following operations: put(h, k, v) returns the hashtable that is equal to h except that key k maps to value v; remove(h, k) returns the hashtable that is equal to h except that key k does not map to a value; and get(h, k) returns the value mapped by k, What’s Decidable About Arrays? 441 which is undetermined if h does not map k to any value. init(h) is true iff h does not map any key. For reasoning about keys, k ∈ keys(h) is true iff h maps k; key sets keys(h) can be unioned, intersected, and complemented. For the encoding onto the map property fragment of TM , universal quantification is restricted to quantification over key variables; such variables may only be used in membership checking, k ∈ K, and gets, get(h, k). Finally, an init in the scope of a universal quantifier must appear positively. The encoding then works as follows: 1. Construct ψ ∧  = ⊥, for fresh constants  and ⊥. 2. Rewrite ψ[put(h, k, v)] ⇒ ψ[h ] ∧ h = h{k ← v} ∧ keysh = keysh {k ← } ψ[remove(h, k)] ⇒ ψ[h ] ∧ keysh = keysh {k ← ⊥} for fresh variable h . 3. Rewrite ψ[get(h, k)] ψ[init(h)] ψ[k ∈ keys(h)] ψ[k ∈ K1 ∪ K2 ] ψ[k ∈ K1 ∩ K2 ] ψ[k ∈ K] ⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ψ[h[k]] ψ[(∀k)(h[k] = ⊥)] ψ[keysh [k] = ⊥] ψ[k ∈ K1 ∨ k ∈ K2 ] ψ[k ∈ K1 ∧ k ∈ K2 ] ψ[¬(k ∈ K)] where K, K1 , and K2 are constructed from union, disjunction, and complementation of membership atoms. Note that we rely on the defined predicate of equality between maps, h1 = h2 , which is defined by (∀k)(h1 [k] = h2 [k]). Subset checking between key sets, K1 ⊂ K2 , and other useful operations can also be defined in this language. An example specification might assert that (∀k ∈ keys(h))(get(h, k) ≥ 0). Suppose that a function modifies h; then a verification condition could be   (∀k ∈ keys(h)) get(h, k) ≥ 0 ∧ v ≥ 0 ∧ h = put(h, s, v) (∀h, s, v, h ) . → (∀k ∈ keys(h )) get(h , k) ≥ 0 The key sets provide a mechanism for reasoning about modifying hashtables. 7.4 Implementation and Results We implemented SATA in our verifying compiler, πVC, which verifies programs written in the pi (for Prove It ) programming language. The syntax of the language is similar to that of Figure 2. We used CVC Lite [2] as the underlying decision procedure. We found that there is usually no need to instantiate quantifiers with all terms in I; instead, the implementation makes several attempts to prove a formula unsatisfiable. It first tries using the set R, then R ∪ B, and finally I. Moreover, common sense rules restrict the instantiation in the early attempts. If any attempt results in an unsatisfiable formula, then the original formula is unsatisfiable; and if the formula of the final attempt is satisfiable, then the original formula is satisfiable. 442 A.R. Bradley, Z. Manna, and H.B. Sipma Frame conditions are ubiquitous in verification conditions. Thus, the implementation performs a simple form of resolution to simplify the original formula. After rewriting based on equations in the antecedent, conjuncts in the consequent that are syntactically equal to conjuncts in the antecedent are replaced with true. In practice, the resulting index sets are smaller. The combination of the phased instantiation and simplification makes the decision procedure quite responsive in practice. We implemented annotated versions of MergeSort, BubbleSort, InsertionSort, QuickSort, Sem-N, and BinarySearch for integer arrays in our programming language pi. Verifying that the sorting algorithms return sorted arrays required less than 20 seconds each (1 second for each of BubbleSort and InsertionSort). Verifying mutual exclusion in Sem-N required a second. Verifying the membership property of BinarySearch required a second. All tests were performed on a 3 GHz x86; memory was not an issue. 8 Future Work Future work will focus on the decidability of the extension identified in Section 5; on the complexity of deciding satisfiability for the full array property fragment for particular element theories; and, most importantly, on generating inductive invariants in the array property fragment automatically. Acknowledgments. We thank the reviewers, Tom Henzinger, and members of the STeP group for their insightful comments and suggestions on this work. References 1. Armando, A., Ranise, S., and Rusinowitch, M. Uniform derivation of decision procedures by superposition. In International Workshop on Computer Science Logic (CSL) (2001), Springer-Verlag. 2. Barrett, C., and Berezin, S. CVC Lite: A new implementation of the cooperating validity checker. In Computer Aided Verification (CAV) (2004), Springer-Verlag. 3. Bradley, A. R., Manna, Z., and Sipma, H. B. Polyranking for polynomial loops. In submission; available at http://theory.stanford.edu/~arbrad. 4. Manna, Z., and Pnueli, A. Temporal Verification of Reactive Systems: Safety. Springer, 1995. 5. McCarthy, J. Towards a mathematical science of computation. In IFIP Congress 62 (1962). 6. Schrijver, A. Theory of Linear and Integer Programming. Wiley, 1986. 7. Stump, A., Barrett, C. W., Dill, D. L., and Levitt, J. R. A decision procedure for an extensional theory of arrays. In Logic in Computer Science (LICS) (2001). 8. Suzuki, N., and Jefferson, D. Verification decidability of Presburger array programs. J. ACM 27, 1 (1980). Author Index Appel, Andrew W. Arnold, Gilad 33 80 Balaban, Ittai 267 Bingham, Jesse 207 Bozzelli, Laura 65, 412 Bradley, Aaron R. 427 Chang, Bor-Yuh Evan 174 Chechik, Marsha 381 Chlipala, Adam 174 Clarke, Edmund 126 Cohen, Ariel 267 Colón, Michael A. 111 Cortesi, Agostino 313 Edelkamp, Stefan 237 Ganty, Pierre 49 Ghafari, Naghmeh 252 Gurfinkel, Arie 381 Hristova, Katia 190 Jabbar, Shahid 237 Jaffar, Joxan 17, 282 Kobayashi, Naoki 298 Kuncak, Viktor 157 La Torre, Salvatore 412 Lam, Patrick 157 Liu, Yanhong A. 190 Logozzo, Francesco 313 Manevich, Roman 33 Manna, Zohar 111, 427 Miné, Antoine 348 Necula, George C. 174 Peron, Adriano 412 Piterman, Nir 364 Pnueli, Amir 267, 364 Podelski, Andreas 157 Rakamarić, Zvonimir 207 Ranzato, Francesco 332 Raskin, Jean-François 49 Rinard, Martin 157 Rossignoli, Stefano 95 Sa’ar, Yaniv 364 Sagiv, Mooly 33 Sankaranarayanan, Sriram 111 Santosa, Andrew E. 17, 282 Schachte, Peter 1 Shaham, Ran 33 Sipma, Henny B. 111, 427 Sistla, A. Prasad 222 Søndergaard, Harald 1 Spoto, Fausto 95 Subramani, K. 398 Suenaga, Kohei 298 Talupur, Muralidhar 126 Tan, Gang 80 Tapparo, Francesco 332 Trefler, Richard 252 Van Begin, Laurent 49 Veith, Helmut 126 Voicu, Răzvan 17, 282 Wei, Ou 381 Wies, Thomas 157 Wischik, Lucian 298 Younes, Håkan L.S. Zhou, Min 222 Zuck, Lenore D. 222 142