Search | arXiv e-print repository

Online Machine Learning Techniques for Coq: A Comparison

Authors: Liao Zhang, Lasse Blaauwbroek, Bartosz Piotrowski, Prokop Černý, Cezary Kaliszyk, Josef Urban

Abstract: We present a comparison of several online machine learning techniques for tactical learning and proving in the Coq proof assistant. This work builds on top of Tactician, a plugin for Coq that learns from proofs written by the user to synthesize new proofs. Learning happens in an online manner, meaning that Tactician's machine learning model is updated immediately every time the user performs a ste… ▽ More We present a comparison of several online machine learning techniques for tactical learning and proving in the Coq proof assistant. This work builds on top of Tactician, a plugin for Coq that learns from proofs written by the user to synthesize new proofs. Learning happens in an online manner, meaning that Tactician's machine learning model is updated immediately every time the user performs a step in an interactive proof. This has important advantages compared to the more studied offline learning systems: (1) it provides the user with a seamless, interactive experience with Tactician and, (2) it takes advantage of locality of proof similarity, which means that proofs similar to the current proof are likely to be found close by. We implement two online methods, namely approximate k-nearest neighbors based on locality sensitive hashing forests and random decision forests. Additionally, we conduct experiments with gradient boosted trees in an offline setting using XGBoost. We compare the relative performance of Tactician using these three learning methods on Coq's standard library. △ Less

Submitted 7 June, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: Intelligent Computer Mathematics 14th International Conference, CICM 2021

arXiv:2006.01991 [pdf, other]

Detecting and Understanding Real-World Differential Performance Bugs in Machine Learning Libraries

Authors: Saeid Tizpaz-Niari, Pavol Cerný, Ashutosh Trivedi

Abstract: Programming errors that degrade the performance of systems are widespread, yet there is little tool support for analyzing these bugs. We present a method based on differential performance analysis---we find inputs for which the performance varies widely, despite having the same size. To ensure that the differences in the performance are robust (i.e. hold also for large inputs), we compare the perf… ▽ More Programming errors that degrade the performance of systems are widespread, yet there is little tool support for analyzing these bugs. We present a method based on differential performance analysis---we find inputs for which the performance varies widely, despite having the same size. To ensure that the differences in the performance are robust (i.e. hold also for large inputs), we compare the performance of not only single inputs, but of classes of inputs, where each class has similar inputs parameterized by their size. Thus, each class is represented by a performance function from the input size to performance. Importantly, we also provide an explanation for why the performance differs in a form that can be readily used to fix a performance bug. The two main phases in our method are discovery with fuzzing and explanation with decision tree classifiers, each of which is supported by clustering. First, we propose an evolutionary fuzzing algorithm to generate inputs. For this fuzzing task, the unique challenge is that we not only need the input class with the worst performance, but rather a set of classes exhibiting differential performance. We use clustering to merge similar input classes which significantly improves the efficiency of our fuzzer. Second, we explain the differential performance in terms of program inputs and internals. We adapt discriminant learning approaches with clustering and decision trees to localize suspicious code regions. We applied our techniques to a set of applications. On a set of micro-benchmarks, we show that our approach outperforms state-of-the-art fuzzers in finding inputs to characterize the differential performance. On a set of case-studies, we discover and explain multiple performance bugs in popular machine learning frameworks. Four of these bugs, reported first in this paper, have since been fixed by the developers. △ Less

Submitted 2 June, 2020; originally announced June 2020.

Comments: To appear in ISSTA'20, 11 pages, 8 figures

ACM Class: D.2.5

arXiv:1907.10159 [pdf, other]

Efficient Detection and Quantification of Timing Leaks with Neural Networks

Authors: Saeid Tizpaz-Niari, Pavol Cerny, Sriram Sankaranarayanan, Ashutosh Trivedi

Abstract: Detection and quantification of information leaks through timing side channels are important to guarantee confidentiality. Although static analysis remains the prevalent approach for detecting timing side channels, it is computationally challenging for real-world applications. In addition, the detection techniques are usually restricted to 'yes' or 'no' answers. In practice, real-world application… ▽ More Detection and quantification of information leaks through timing side channels are important to guarantee confidentiality. Although static analysis remains the prevalent approach for detecting timing side channels, it is computationally challenging for real-world applications. In addition, the detection techniques are usually restricted to 'yes' or 'no' answers. In practice, real-world applications may need to leak information about the secret. Therefore, quantification techniques are necessary to evaluate the resulting threats of information leaks. Since both problems are very difficult or impossible for static analysis techniques, we propose a dynamic analysis method. Our novel approach is to split the problem into two tasks. First, we learn a timing model of the program as a neural network. Second, we analyze the neural network to quantify information leaks. As demonstrated in our experiments, both of these tasks are feasible in practice --- making the approach a significant improvement over the state-of-the-art side channel detectors and quantifiers. Our key technical contributions are (a) a neural network architecture that enables side channel discovery and (b) an MILP-based algorithm to estimate the side-channel strength. On a set of micro-benchmarks and real-world applications, we show that neural network models learn timing behaviors of programs with thousands of methods. We also show that neural networks with thousands of neurons can be efficiently analyzed to detect and quantify information leaks through timing side channels. △ Less

Submitted 23 July, 2019; originally announced July 2019.

Comments: To Appear in RV'19

arXiv:1906.08957 [pdf, other]

Quantitative Mitigation of Timing Side Channels

Authors: Saeid Tizpaz-Niari, Pavol Cerny, Ashutosh Trivedi

Abstract: Timing side channels pose a significant threat to the security and privacy of software applications. We propose an approach for mitigating this problem by decreasing the strength of the side channels as measured by entropy-based objectives, such as min-guess entropy. Our goal is to minimize the information leaks while guaranteeing a user-specified maximal acceptable performance overhead. We dub th… ▽ More Timing side channels pose a significant threat to the security and privacy of software applications. We propose an approach for mitigating this problem by decreasing the strength of the side channels as measured by entropy-based objectives, such as min-guess entropy. Our goal is to minimize the information leaks while guaranteeing a user-specified maximal acceptable performance overhead. We dub the decision version of this problem Shannon mitigation, and consider two variants, deterministic and stochastic. First, we show the deterministic variant is NP-hard. However, we give a polynomial algorithm that finds an optimal solution from a restricted set. Second, for the stochastic variant, we develop an algorithm that uses optimization techniques specific to the entropy-based objective used. For instance, for min-guess entropy, we used mixed integer-linear programming. We apply the algorithm to a threat model where the attacker gets to make functional observations, that is, where she observes the running time of the program for the same secret value combined with different public input values. Existing mitigation approaches do not give confidentiality or performance guarantees for this threat model. We evaluate our tool SCHMIT on a number of micro-benchmarks and real-world applications with different entropy-based objectives. In contrast to the existing mitigation approaches, we show that in the functional-observation threat model, SCHMIT is scalable and able to maximize confidentiality under the performance overhead bound. △ Less

Submitted 21 June, 2019; originally announced June 2019.

Comments: To Appear in CAV 2019

arXiv:1810.10443 [pdf, ps, other]

Type-directed Bounding of Collections in Reactive Programs

Authors: Tianhan Lu, Pavol Cerny, Bor-Yuh Evan Chang, Ashutosh Trivedi

Abstract: Our aim is to statically verify that in a given reactive program, the length of collection variables does not grow beyond a given bound. We propose a scalable type-based technique that checks that each collection variable has a given refinement type that specifies constraints about its length. A novel feature of our refinement types is that the refinements can refer to AST counters that track how… ▽ More Our aim is to statically verify that in a given reactive program, the length of collection variables does not grow beyond a given bound. We propose a scalable type-based technique that checks that each collection variable has a given refinement type that specifies constraints about its length. A novel feature of our refinement types is that the refinements can refer to AST counters that track how many times an AST node has been executed. This feature enables type refinements to track limited flow-sensitive information. We generate verification conditions that ensure that the AST counters are used consistently, and that the types imply the given bound. The verification conditions are discharged by an off-the-shelf SMT solver. Experimental results demonstrate that our technique is scalable, and effective at verifying reactive programs with respect to requirements on length of collections. △ Less

Submitted 28 January, 2019; v1 submitted 24 October, 2018; originally announced October 2018.

arXiv:1808.10502 [pdf, other]

Data-Driven Debugging for Functional Side Channels

Authors: Saeid Tizpaz-Niari, Pavol Cerny, Ashutosh Trivedi

Abstract: Information leaks through side channels are a pervasive problem, even in security-critical applications. Functional side channels arise when an attacker knows that a secret value of a server stays fixed for a certain time. Then, the attacker can observe the server executions on a sequence of different public inputs, each paired with the same secret input. Thus for each secret, the attacker observe… ▽ More Information leaks through side channels are a pervasive problem, even in security-critical applications. Functional side channels arise when an attacker knows that a secret value of a server stays fixed for a certain time. Then, the attacker can observe the server executions on a sequence of different public inputs, each paired with the same secret input. Thus for each secret, the attacker observes a function from public inputs to execution time, for instance, and she can compare these functions for different secrets. First, we introduce a notion of noninterference for functional side channels. We focus on the case of noisy observations, where we demonstrate with examples that there is a practical functional side channel in programs that would be deemed information-leak-free or be underestimated using the standard definition. Second, we develop a framework and techniques for debugging programs for functional side channels. We extend evolutionary fuzzing techniques to generate inputs that exploit functional dependencies of response times on public inputs. We adapt existing results and algorithms in functional data analysis to model the functions and discover the existence of side channels. We use a functional extension of standard decision tree learning to pinpoint the code fragments causing a side channel if there is one. We empirically evaluate the performance of our tool FUCHSIA on a series of micro-benchmarks and realistic Java programs. On the set of benchmarks, we show that FUCHSIA outperforms the state-of-the-art techniques in detecting side channel classes. On the realistic programs, we show the scalability of FUCHSIA in analyzing functional side channels in Java programs with thousands of methods. Also, we show the usefulness of FUCHSIA in finding side channels including a zero-day vulnerability in OpenJDK and another vulnerability in Jetty that was since fixed by the developers. △ Less

Submitted 7 February, 2020; v1 submitted 30 August, 2018; originally announced August 2018.

Comments: To Appear in NDSS'20 (17 pages, 11 figures)

arXiv:1802.08733 [pdf, other]

Conflict-Aware Replicated Data Types

Authors: Nicholas V. Lewchenko, Arjun Radhakrishna, Akash Gaonkar, Pavol Černý

Abstract: We introduce Conflict-Aware Replicated Data Types (CARDs). CARDs are significantly more expressive than Conflict-free Replicated Data Types (CRDTs) as they support operations that can conflict with each other. Introducing conflicting operations typically brings the need to block an operation in at least some executions, leading to difficulties in programming and reasoning about correctness, as wel… ▽ More We introduce Conflict-Aware Replicated Data Types (CARDs). CARDs are significantly more expressive than Conflict-free Replicated Data Types (CRDTs) as they support operations that can conflict with each other. Introducing conflicting operations typically brings the need to block an operation in at least some executions, leading to difficulties in programming and reasoning about correctness, as well as potential inefficiencies in implementation. The salient aspect of CARDs is that they allow ease of programming and reasoning about programs comparable to CRDTs, while enabling algorithmic inference of conflicts so that an operation is blocked only when necessary. The key idea is to have a language that allows associating with each operation a two-state predicate called {\em consistency guard} that relates the state of the replica on which the operation is executing to a global state (which is never computed). The consistency guards bring three advantages. First, a programmer developing an operation needs only to choose a consistency guard that states what the operation will rely on. In particular, they do not need to consider the operation conflicts with other operation. This allows purely {\em modular reasoning}. Second, we show that consistency guard allow reducing the complexity of reasoning needed to prove invariants that hold as CARD operations are executing. The reason is that consistency guard allow reducing the reasoning about concurrency among operations to purely {\em sequential reasoning}. Third, conflicts among operations can be algorithmically inferred by checking whether the effect of one operation preserves the consistency guard of another operation. △ Less

Submitted 26 September, 2018; v1 submitted 23 February, 2018; originally announced February 2018.

arXiv:1711.04076 [pdf, ps, other]

Differential Performance Debugging with Discriminant Regression Trees

Authors: Saeid Tizpaz-Niari, Pavol Cerny, Bor-Yuh Evan Chang, Ashutosh Trivedi

Abstract: Differential performance debugging is a technique to find performance problems. It applies in situations where the performance of a program is (unexpectedly) different for different classes of inputs. The task is to explain the differences in asymptotic performance among various input classes in terms of program internals. We propose a data-driven technique based on discriminant regression tree (D… ▽ More Differential performance debugging is a technique to find performance problems. It applies in situations where the performance of a program is (unexpectedly) different for different classes of inputs. The task is to explain the differences in asymptotic performance among various input classes in terms of program internals. We propose a data-driven technique based on discriminant regression tree (DRT) learning problem where the goal is to discriminate among different classes of inputs. We propose a new algorithm for DRT learning that first clusters the data into functional clusters, capturing different asymptotic performance classes, and then invokes off-the-shelf decision tree learning algorithms to explain these clusters. We focus on linear functional clusters and adapt classical clustering algorithms (K-means and spectral) to produce them. For the K-means algorithm, we generalize the notion of the cluster centroid from a point to a linear function. We adapt spectral clustering by defining a novel kernel function to capture the notion of linear similarity between two data points. We evaluate our approach on benchmarks consisting of Java programs where we are interested in debugging performance. We show that our algorithm significantly outperforms other well-known regression tree learning algorithms in terms of running time and accuracy of classification. △ Less

Submitted 28 November, 2017; v1 submitted 10 November, 2017; originally announced November 2017.

Comments: To Appear in AAAI 2018

arXiv:1702.07103 [pdf, other]

Discriminating Traces with Time

Authors: Saeid Tizpaz-Niari, Pavol Cerny, Bor-Yuh Evan Chang, Sriram Sankaranarayanan, Ashutosh Trivedi

Abstract: What properties about the internals of a program explain the possible differences in its overall running time for different inputs? In this paper, we propose a formal framework for considering this question we dub trace-set discrimination. We show that even though the algorithmic problem of computing maximum likelihood discriminants is NP-hard, approaches based on integer linear programming (ILP)… ▽ More What properties about the internals of a program explain the possible differences in its overall running time for different inputs? In this paper, we propose a formal framework for considering this question we dub trace-set discrimination. We show that even though the algorithmic problem of computing maximum likelihood discriminants is NP-hard, approaches based on integer linear programming (ILP) and decision tree learning can be useful in zeroing-in on the program internals. On a set of Java benchmarks, we find that compactly-represented decision trees scalably discriminate with high accuracy---more scalably than maximum likelihood discriminants and with comparable accuracy. We demonstrate on three larger case studies how decision-tree discriminants produced by our tool are useful for debugging timing side-channel vulnerabilities (i.e., where a malicious observer infers secrets simply from passively watching execution times) and availability vulnerabilities. △ Less

Submitted 23 February, 2017; originally announced February 2017.

Comments: Published in TACAS 2017

arXiv:1701.07842 [pdf, other]

DroidStar: Callback Typestates for Android Classes

Authors: Arjun Radhakrishna, Nicholas V. Lewchenko, Shawn Meier, Sergio Mover, Krishna Chaitanya Sripada, Damien Zufferey, Bor-Yuh Evan Chang, Pavol Černý

Abstract: Event-driven programming frameworks, such as Android, are based on components with asynchronous interfaces. The protocols for interacting with these components can often be described by finite-state machines we dub *callback typestates*. Callback typestates are akin to classical typestates, with the difference that their outputs (callbacks) are produced asynchronously. While useful, these specific… ▽ More Event-driven programming frameworks, such as Android, are based on components with asynchronous interfaces. The protocols for interacting with these components can often be described by finite-state machines we dub *callback typestates*. Callback typestates are akin to classical typestates, with the difference that their outputs (callbacks) are produced asynchronously. While useful, these specifications are not commonly available, because writing them is difficult and error-prone. Our goal is to make the task of producing callback typestates significantly easier. We present a callback typestate assistant tool, DroidStar, that requires only limited user interaction to produce a callback typestate. Our approach is based on an active learning algorithm, L*. We improved the scalability of equivalence queries (a key component of L*), thus making active learning tractable on the Android system. We use DroidStar to learn callback typestates for Android classes both for cases where one is already provided by the documentation, and for cases where the documentation is unclear. The results show that DroidStar learns callback typestates accurately and efficiently. Moreover, in several cases, the synthesized callback typestates uncovered surprising and undocumented behaviors. △ Less

Submitted 2 March, 2018; v1 submitted 26 January, 2017; originally announced January 2017.

Comments: Appearing at ICSE 2018

arXiv:1607.05159 [pdf, ps, other]

Optimal Consistent Network Updates in Polynomial Time

Authors: Pavol Cerny, Nate Foster, Nilesh Jagnik, Jedidiah McClurg

Abstract: Software-defined networking (SDN) allows operators to control the behavior of a network by programatically managing the forwarding rules installed on switches. However, as is common in distributed systems, it can be difficult to ensure that certain consistency properties are preserved during periods of reconfiguration. The widely-accepted notion of PER-PACKET CONSISTENCY requires every packet to b… ▽ More Software-defined networking (SDN) allows operators to control the behavior of a network by programatically managing the forwarding rules installed on switches. However, as is common in distributed systems, it can be difficult to ensure that certain consistency properties are preserved during periods of reconfiguration. The widely-accepted notion of PER-PACKET CONSISTENCY requires every packet to be forwarded using the new configuration or the old configuration, but not a mixture of the two. If switches can be updated in some (partial) order which guarantees that per-packet consistency is preserved, we call this order a CONSISTENT ORDER UPDATE. In particular, switches that are incomparable in this order can be updated in parallel. We call a consistent order update OPTIMAL if it allows maximal parallelism. This paper presents a polynomial-time algorithm for finding an optimal consistent order update. This contrasts with other recent results in the literature, which show that for other classes of properties (e.g., loop-freedom and waypoint enforcement), the optimal update problem is NP-complete. △ Less

Submitted 18 July, 2016; originally announced July 2016.

ACM Class: C.2.3; D.3.2

arXiv:1602.00786

doi 10.4204/EPTCS.202

Proceedings Fourth Workshop on Synthesis

Authors: Pavol Černý, Viktor Kuncak, Madhusudan Parthasarathy

Abstract: The SYNT workshop aims to bring together researchers interested in the broad area of synthesis of computing systems. The goal is to foster the development of frontier techniques in automating the development of computing system. Contributions of interest include algorithms, complexity and decidability analysis, as well as reproducible heuristics, implemented tools, and experimental evaluation. Ap… ▽ More The SYNT workshop aims to bring together researchers interested in the broad area of synthesis of computing systems. The goal is to foster the development of frontier techniques in automating the development of computing system. Contributions of interest include algorithms, complexity and decidability analysis, as well as reproducible heuristics, implemented tools, and experimental evaluation. Application domains include software, hardware, embedded, and cyberphysical systems. Computation models include functional, reactive, hybrid and timed systems. Identifying, formalizing, and evaluating synthesis in particular application domains is encouraged. The fourth iteration of the workshop took place in San Francisco, CA, USA. It was co-located with the 27th International Conference on Computer Aided Verification. The workshop included five contributed talks and two invited talks. In addition, it featured a special session about the Syntax-Guided Synthesis Competition (SyGuS) and the SyntComp Synthesis competition. △ Less

Submitted 1 February, 2016; originally announced February 2016.

Journal ref: EPTCS 202, 2016

arXiv:1511.07163 [pdf, other]

Optimizing Solution Quality in Synchronization Synthesis

Authors: Pavol Černý, Edmund M. Clarke, Thomas A. Henzinger, Arjun Radhakrishna, Leonid Ryzhyk, Roopsha Samanta, Thorsten Tarrach

Abstract: Given a multithreaded program written assuming a friendly, non-preemptive scheduler, the goal of synchronization synthesis is to automatically insert synchronization primitives to ensure that the modified program behaves correctly, even with a preemptive scheduler. In this work, we focus on the quality of the synthesized solution: we aim to infer synchronization placements that not only ensure cor… ▽ More Given a multithreaded program written assuming a friendly, non-preemptive scheduler, the goal of synchronization synthesis is to automatically insert synchronization primitives to ensure that the modified program behaves correctly, even with a preemptive scheduler. In this work, we focus on the quality of the synthesized solution: we aim to infer synchronization placements that not only ensure correctness, but also meet some quantitative objectives such as optimal program performance on a given computing platform. The key step that enables solution optimization is the construction of a set of global constraints over synchronization placements such that each model of the constraints set corresponds to a correctness-ensuring synchronization placement. We extract the global constraints from generalizations of counterexample traces and the control-flow graph of the program. The global constraints enable us to choose from among the encoded synchronization solutions using an objective function. We consider two types of objective functions: ones that are solely dependent on the program (e.g., minimizing the size of critical sections) and ones that are also dependent on the computing platform. For the latter, given a program and a computing platform, we construct a performance model based on measuring average contention for critical sections and the average time taken to acquire and release a lock under a given average contention. We empirically evaluated that our approach scales to typical module sizes of many real world concurrent programs such as device drivers and multithreaded servers, and that the performance predictions match reality. To the best of our knowledge, this is the first comprehensive approach for optimizing the placement of synthesized synchronization. △ Less

Submitted 23 November, 2015; originally announced November 2015.

arXiv:1507.07049 [pdf, other]

doi 10.1145/2908080.2908097

Event-Driven Network Programming

Authors: Jedidiah McClurg, Hossein Hojjat, Nate Foster, Pavol Cerny

Abstract: Software-defined networking (SDN) programs must simultaneously describe static forwarding behavior and dynamic updates in response to events. Event-driven updates are critical to get right, but difficult to implement correctly due to the high degree of concurrency in networks. Existing SDN platforms offer weak guarantees that can break application invariants, leading to problems such as dropped pa… ▽ More Software-defined networking (SDN) programs must simultaneously describe static forwarding behavior and dynamic updates in response to events. Event-driven updates are critical to get right, but difficult to implement correctly due to the high degree of concurrency in networks. Existing SDN platforms offer weak guarantees that can break application invariants, leading to problems such as dropped packets, degraded performance, security violations, etc. This paper introduces EVENT-DRIVEN CONSISTENT UPDATES that are guaranteed to preserve well-defined behaviors when transitioning between configurations in response to events. We propose NETWORK EVENT STRUCTURES (NESs) to model constraints on updates, such as which events can be enabled simultaneously and causal dependencies between events. We define an extension of the NetKAT language with mutable state, give semantics to stateful programs using NESs, and discuss provably-correct strategies for implementing NESs in SDNs. Finally, we evaluate our approach empirically, demonstrating that it gives well-defined consistency guarantees while avoiding expensive synchronization and packet buffering. △ Less

Submitted 15 April, 2016; v1 submitted 24 July, 2015; originally announced July 2015.

ACM Class: C.2.3; D.3.2; D.3.4

arXiv:1505.05868 [pdf, ps, other]

Synthesis through Unification

Authors: Rajeev Alur, Pavol Cerny, Arjun Radhakrishna

Abstract: Given a specification and a set of candidate programs (program space), the program synthesis problem is to find a candidate program that satisfies the specification. We present the synthesis through unification (STUN) approach, which is an extension of the counter-example guided inductive synthesis (CEGIS) approach. In CEGIS, the synthesizer maintains a subset S of inputs and a candidate program P… ▽ More Given a specification and a set of candidate programs (program space), the program synthesis problem is to find a candidate program that satisfies the specification. We present the synthesis through unification (STUN) approach, which is an extension of the counter-example guided inductive synthesis (CEGIS) approach. In CEGIS, the synthesizer maintains a subset S of inputs and a candidate program Prog that is correct for S. The synthesizer repeatedly checks if there exists a counter-example input c such that the execution of Prog is incorrect on c. If so, the synthesizer enlarges S to include c, and picks a program from the program space that is correct for the new set S. The STUN approach extends CEGIS with the idea that given a program Prog that is correct for a subset of inputs, the synthesizer can try to find a program Prog' that is correct for the rest of the inputs. If Prog and Prog' can be unified into a program in the program space, then a solution has been found. We present a generic synthesis procedure based on the STUN approach and specialize it for three different domains by providing the appropriate unification operators. We implemented these specializations in prototype tools, and we show that our tools often per- forms significantly better on standard benchmarks than a tool based on a pure CEGIS approach. △ Less

Submitted 21 May, 2015; originally announced May 2015.

arXiv:1505.04533 [pdf, ps, other]

From Non-preemptive to Preemptive Scheduling using Synchronization Synthesis

Authors: Pavol Černý, Edmund M. Clarke, Thomas A. Henzinger, Arjun Radhakrishna, Leonid Ryzhyk, Roopsha Samanta, Thorsten Tarrach

Abstract: We present a computer-aided programming approach to concurrency. The approach allows programmers to program assuming a friendly, non-preemptive scheduler, and our synthesis procedure inserts synchronization to ensure that the final program works even with a preemptive scheduler. The correctness specification is implicit, inferred from the non-preemptive behavior. Let us consider sequences of calls… ▽ More We present a computer-aided programming approach to concurrency. The approach allows programmers to program assuming a friendly, non-preemptive scheduler, and our synthesis procedure inserts synchronization to ensure that the final program works even with a preemptive scheduler. The correctness specification is implicit, inferred from the non-preemptive behavior. Let us consider sequences of calls that the program makes to an external interface. The specification requires that any such sequence produced under a preemptive scheduler should be included in the set of such sequences produced under a non-preemptive scheduler. The solution is based on a finitary abstraction, an algorithm for bounded language inclusion modulo an independence relation, and rules for inserting synchronization. We apply the approach to device-driver programming, where the driver threads call the software interface of the device and the API provided by the operating system. Our experiments demonstrate that our synthesis method is precise and efficient, and, since it does not require explicit specifications, is more practical than the conventional approach based on user-provided assertions. △ Less

Submitted 18 May, 2015; originally announced May 2015.

Comments: Liss is published as open-source at https://github.com/thorstent/Liss, Computer Aided Verification 2015

arXiv:1407.3681 [pdf, ps, other]

doi 10.1007/978-3-319-08867-9_38

Regression-free Synthesis for Concurrency

Authors: Pavol Černý, Thomas A. Henzinger, Arjun Radhakrishna, Leonid Ryzhyk, Thorsten Tarrach

Abstract: While fixing concurrency bugs, program repair algorithms may introduce new concurrency bugs. We present an algorithm that avoids such regressions. The solution space is given by a set of program transformations we consider in for repair process. These include reordering of instructions within a thread and inserting atomic sections. The new algorithm learns a constraint on the space of candidate so… ▽ More While fixing concurrency bugs, program repair algorithms may introduce new concurrency bugs. We present an algorithm that avoids such regressions. The solution space is given by a set of program transformations we consider in for repair process. These include reordering of instructions within a thread and inserting atomic sections. The new algorithm learns a constraint on the space of candidate solutions, from both positive examples (error-free traces) and counterexamples (error traces). From each counterexample, the algorithm learns a constraint necessary to remove the errors. From each positive examples, it learns a constraint that is necessary in order to prevent the repair from turning the trace into an error trace. We implemented the algorithm and evaluated it on simplified Linux device drivers with known bugs. △ Less

Submitted 14 July, 2014; originally announced July 2014.

Comments: for source code see https://github.com/thorstent/ConRepair

Journal ref: Computer Aided Verification, Lecture Notes in Computer Science Volume 8559, 2014, pp 568-584

arXiv:1403.7840 [pdf, other]

doi 10.4204/EPTCS.142.8

Toward Synthesis of Network Updates

Authors: Andrew Noyes, Todd Warszawski, Pavol Černý, Nate Foster

Abstract: Updates to network configurations are notoriously difficult to implement correctly. Even if the old and new configurations are correct, the update process can introduce transient errors such as forwarding loops, dropped packets, and access control violations. The key factor that makes updates difficult to implement is that networks are distributed systems with hundreds or even thousands of nodes… ▽ More Updates to network configurations are notoriously difficult to implement correctly. Even if the old and new configurations are correct, the update process can introduce transient errors such as forwarding loops, dropped packets, and access control violations. The key factor that makes updates difficult to implement is that networks are distributed systems with hundreds or even thousands of nodes, but updates must be rolled out one node at a time. In networks today, the task of determining a correct sequence of updates is usually done manually -- a tedious and error-prone process for network operators. This paper presents a new tool for synthesizing network updates automatically. The tool generates efficient updates that are guaranteed to respect invariants specified by the operator. It works by navigating through the (restricted) space of possible solutions, learning from counterexamples to improve scalability and optimize performance. We have implemented our tool in OCaml, and conducted experiments showing that it scales to networks with a thousand switches and tens of switches updating. △ Less

Submitted 30 March, 2014; originally announced March 2014.

Comments: In Proceedings SYNT 2013, arXiv:1403.7264

Journal ref: EPTCS 142, 2014, pp. 8-23

arXiv:1403.5843 [pdf, other]

doi 10.1145/2737924.2737980

Efficient Synthesis of Network Updates

Authors: Jedidiah McClurg, Hossein Hojjat, Pavol Cerny, Nate Foster

Abstract: Software-defined networking (SDN) is revolutionizing the networking industry, but current SDN programming platforms do not provide automated mechanisms for updating global configurations on the fly. Implementing updates by hand is challenging for SDN programmers because networks are distributed systems with hundreds or thousands of interacting nodes. Even if initial and final configurations are co… ▽ More Software-defined networking (SDN) is revolutionizing the networking industry, but current SDN programming platforms do not provide automated mechanisms for updating global configurations on the fly. Implementing updates by hand is challenging for SDN programmers because networks are distributed systems with hundreds or thousands of interacting nodes. Even if initial and final configurations are correct, naively updating individual nodes can lead to incorrect transient behaviors, including loops, black holes, and access control violations. This paper presents an approach for automatically synthesizing updates that are guaranteed to preserve specified properties. We formalize network updates as a distributed programming problem and develop a synthesis algorithm based on counterexample-guided search and incremental model checking. We describe a prototype implementation, and present results from experiments on real-world topologies and properties demonstrating that our tool scales to updates involving over one-thousand nodes. △ Less

Submitted 16 April, 2015; v1 submitted 23 March, 2014; originally announced March 2014.

ACM Class: D.2.4; F.3.1; F.4.1; C.2.3

arXiv:1210.2450 [pdf, other]

doi 10.4204/EPTCS.96.3

Interface Simulation Distances

Authors: Pavol Černý, Martin Chmelík, Thomas A. Henzinger, Arjun Radhakrishna

Abstract: The classical (boolean) notion of refinement for behavioral interfaces of system components is the alternating refinement preorder. In this paper, we define a distance for interfaces, called interface simulation distance. It makes the alternating refinement preorder quantitative by, intuitively, tolerating errors (while counting them) in the alternating simulation game. We show that the interface… ▽ More The classical (boolean) notion of refinement for behavioral interfaces of system components is the alternating refinement preorder. In this paper, we define a distance for interfaces, called interface simulation distance. It makes the alternating refinement preorder quantitative by, intuitively, tolerating errors (while counting them) in the alternating simulation game. We show that the interface simulation distance satisfies the triangle inequality, that the distance between two interfaces does not increase under parallel composition with a third interface, and that the distance between two interfaces can be bounded from above and below by distances between abstractions of the two interfaces. We illustrate the framework, and the properties of the distances under composition of interfaces, with two case studies. △ Less

Submitted 8 October, 2012; originally announced October 2012.

Comments: In Proceedings GandALF 2012, arXiv:1210.2028

Journal ref: EPTCS 96, 2012, pp. 29-42

arXiv:1104.4306 [pdf, other]

Quantitative Synthesis for Concurrent Programs

Authors: Pavol Cerny, Krishnendu Chatterjee, Thomas Henzinger, Arjun Radhakrishna, Rohit Singh

Abstract: We present an algorithmic method for the quantitative, performance-aware synthesis of concurrent programs. The input consists of a nondeterministic partial program and of a parametric performance model. The nondeterminism allows the programmer to omit which (if any) synchronization construct is used at a particular program location. The performance model, specified as a weighted automaton, can cap… ▽ More We present an algorithmic method for the quantitative, performance-aware synthesis of concurrent programs. The input consists of a nondeterministic partial program and of a parametric performance model. The nondeterminism allows the programmer to omit which (if any) synchronization construct is used at a particular program location. The performance model, specified as a weighted automaton, can capture system architectures by assigning different costs to actions such as locking, context switching, and memory and cache accesses. The quantitative synthesis problem is to automatically resolve the nondeterminism of the partial program so that both correctness is guaranteed and performance is optimal. As is standard for shared memory concurrency, correctness is formalized "specification free", in particular as race freedom or deadlock freedom. For worst-case (average-case) performance, we show that the problem can be reduced to 2-player graph games (with probabilistic transitions) with quantitative objectives. While we show, using game-theoretic methods, that the synthesis problem is NEXP-complete, we present an algorithmic method and an implementation that works efficiently for concurrent programs and performance models of practical interest. We have implemented a prototype tool and used it to synthesize finite-state concurrent programs that exhibit different programming patterns, for several performance models representing different architectures. △ Less

Submitted 21 April, 2011; originally announced April 2011.

arXiv:1007.4958 [pdf, other]

Algorithmic Verification of Single-Pass List Processing Programs

Authors: Rajeev Alur, Pavol Cerny

Abstract: We introduce streaming data string transducers that map input data strings to output data strings in a single left-to-right pass in linear time. Data strings are (unbounded) sequences of data values, tagged with symbols from a finite set, over a potentially infinite data domain that supports only the operations of equality and ordering. The transducer uses a finite set of states, a finite set of v… ▽ More We introduce streaming data string transducers that map input data strings to output data strings in a single left-to-right pass in linear time. Data strings are (unbounded) sequences of data values, tagged with symbols from a finite set, over a potentially infinite data domain that supports only the operations of equality and ordering. The transducer uses a finite set of states, a finite set of variables ranging over the data domain, and a finite set of variables ranging over data strings. At every step, it can make decisions based on the next input symbol, updating its state, remembering the input data value in its data variables, and updating data-string variables by concatenating data-string variables and new symbols formed from data variables, while avoiding duplication. We establish that the problems of checking functional equivalence of two streaming transducers, and of checking whether a streaming transducer satisfies pre/post verification conditions specified by streaming acceptors over input/output data-strings, are in PSPACE. We identify a class of imperative and a class of functional programs, manipulating lists of data items, which can be effectively translated to streaming data-string transducers. The imperative programs dynamically modify a singly-linked heap by changing next-pointers of heap-nodes and by adding new nodes. The main restriction specifies how the next-pointers can be used for traversal. We also identify an expressively equivalent fragment of functional programs that traverse a list using syntactically restricted recursive calls. Our results lead to algorithms for assertion checking and for checking functional equivalence of two programs, written possibly in different programming styles, for commonly used routines such as insert, delete, and reverse. △ Less

Submitted 14 February, 2011; v1 submitted 28 July, 2010; originally announced July 2010.

Showing 1–22 of 22 results for author: Černý, P