Search | arXiv e-print repository

Algebraic Reasoning Meets Automata in Solving Linear Integer Arithmetic (Technical Report)

Authors: Peter Habermehl, Vojtěch Havlena, Michal Hečko, Lukáš Holík, Ondřej Lengál

Abstract: We present a new angle on solving quantified linear integer arithmetic based on combining the automata-based approach, where numbers are understood as bitvectors, with ideas from (nowadays prevalent) algebraic approaches, which work directly with numbers. This combination is enabled by a fine-grained version of the duality between automata and arithmetic formulae. In particular, we employ a constr… ▽ More We present a new angle on solving quantified linear integer arithmetic based on combining the automata-based approach, where numbers are understood as bitvectors, with ideas from (nowadays prevalent) algebraic approaches, which work directly with numbers. This combination is enabled by a fine-grained version of the duality between automata and arithmetic formulae. In particular, we employ a construction where states of automaton are obtained as derivatives of arithmetic formulae: then every state corresponds to a formula. Optimizations based on techniques and ideas transferred from the world of algebraic methods are used on thousands of automata states, which dramatically amplifies their effect. The merit of this combination of automata with algebraic methods is demonstrated by our prototype implementation being competitive to and even superior to state-of-the-art SMT solvers. △ Less

Submitted 18 May, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted to CAV'24

arXiv:2310.10136 [pdf, other]

Mata, a Fast and Simple Finite Automata Library (Technical Report)

Authors: David Chocholatý, Tomáš Fiedor, Vojtěch Havlena, Lukáš Holík, Martin Hruška, Ondřej Lengál, Juraj Síč

Abstract: Mata is a well-engineered automata library written in C++ that offers a unique combination of speed and simplicity. It is meant to serve in applications such as string constraint solving and reasoning about regular expressions, and as a~reference implementation of automata algorithms. Besides basic algorithms for (non)deterministic automata, it implements a fast simulation reduction and antichain-… ▽ More Mata is a well-engineered automata library written in C++ that offers a unique combination of speed and simplicity. It is meant to serve in applications such as string constraint solving and reasoning about regular expressions, and as a~reference implementation of automata algorithms. Besides basic algorithms for (non)deterministic automata, it implements a fast simulation reduction and antichain-based language inclusion checking. The simplicity allows a straightforward access to the low-level structures, making it relatively easy to extend and modify. Besides the C++ API, the library also implements a Python binding. The library comes with a large benchmark of automata problems collected from relevant applications such as string constraint solving, regular model checking, and reasoning about regular expressions. We show that Mata is on this benchmark significantly faster than all libraries from a wide range of automata libraries we collected. Its usefulness in string constraint solving is demonstrated by the string solver Z3-Noodler, which is based on Mata and outperforms the state of the art in string constraint solving on many standard benchmarks. △ Less

Submitted 27 March, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.08327 [pdf, other]

Z3-Noodler: An Automata-based String Solver (Technical Report)

Authors: Yu-Fang Chen, David Chocholatý, Vojtěch Havlena, Lukáš Holík, Ondřej Lengál, Juraj Síč

Abstract: Z3-Noodler is a fork of Z3 that replaces its string theory solver with a custom solver implementing the recently introduced stabilization-based algorithm for solving word equations with regular constraints. An extensive experimental evaluation shows that Z3-Noodler is a fully-fledged solver that can compete with state-of-the-art solvers, surpassing them by far on many benchmarks. Moreover, it is o… ▽ More Z3-Noodler is a fork of Z3 that replaces its string theory solver with a custom solver implementing the recently introduced stabilization-based algorithm for solving word equations with regular constraints. An extensive experimental evaluation shows that Z3-Noodler is a fully-fledged solver that can compete with state-of-the-art solvers, surpassing them by far on many benchmarks. Moreover, it is often complementary to other solvers, making it a suitable choice as a candidate to a solver portfolio. △ Less

Submitted 17 October, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2303.01142 [pdf, other]

A Symbolic Algorithm for the Case-Split Rule in Solving Word Constraints with Extensions (Technical Report)

Authors: Yu-Fang Chen, Vojtěch Havlena, Ondřej Lengál, Andrea Turrini

Abstract: Case split is a core proof rule in current decision procedures for the theory of string constraints. Its use is the primary cause of the state space explosion in string constraint solving, since it is the only rule that creates branches in the proof tree. Moreover, explicit handling of the case split rule may cause recomputation of the same tasks in multiple branches of the proof tree. In this pap… ▽ More Case split is a core proof rule in current decision procedures for the theory of string constraints. Its use is the primary cause of the state space explosion in string constraint solving, since it is the only rule that creates branches in the proof tree. Moreover, explicit handling of the case split rule may cause recomputation of the same tasks in multiple branches of the proof tree. In this paper, we propose a symbolic algorithm that significantly reduces such a redundancy. In particular, we encode a string constraint as a regular language and proof rules as rational transducers. This allows us to perform similar steps in the proof tree only once, alleviating the state space explosion. We also extend the encoding to handle arbitrary Boolean combinations of string constraints, length constraints, and regular constraints. In our experimental results, we validate that our technique works in many practical cases where other state-of-the-art solvers fail to provide an answer; our Python prototype implementation solved over 50 % of string constraints that could not be solved by the other tools. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: An extended version of a paper from APLAS'20, accepted to Journal of Systems and Software

arXiv:2301.07747 [pdf, ps, other]

An Automata-based Framework for Verification and Bug Hunting in Quantum Circuits (Technical Report)

Authors: Yu-Fang Chen, Kai-Min Chung, Ondřej Lengál, Jyun-Ao Lin, Wei-Lun Tsai, Di-De Yen

Abstract: We introduce a new paradigm for analysing and finding bugs in quantum circuits. In our approach, the problem is given by a triple $\{P\}\,C\,\{Q\}$ and the question is whether, given a set $P$ of quantum states on the input of a circuit $C$, the set of quantum states on the output is equal to (or included in) a set $Q$. While this is not suitable to specify, e.g., functional correctness of a quant… ▽ More We introduce a new paradigm for analysing and finding bugs in quantum circuits. In our approach, the problem is given by a triple $\{P\}\,C\,\{Q\}$ and the question is whether, given a set $P$ of quantum states on the input of a circuit $C$, the set of quantum states on the output is equal to (or included in) a set $Q$. While this is not suitable to specify, e.g., functional correctness of a quantum circuit, it is sufficient to detect many bugs in quantum circuits. We propose a technique based on tree automata to compactly represent sets of quantum states and develop transformers to implement the semantics of quantum gates over this representation. Our technique computes with an algebraic representation of quantum states, avoiding the inaccuracy of working with floating-point numbers. We implemented the proposed approach in a prototype tool and evaluated its performance against various benchmarks from the literature. The evaluation shows that our approach is quite scalable, e.g., we managed to verify a large circuit with 40 qubits and 141,527 gates, or catch bugs injected into a circuit with 320 qubits and 1,758 gates, where all tools we compared with failed. In addition, our work establishes a connection between quantum program verification and automata, opening new possibilities to exploit the richness of automata theory and automata-based verification in the world of quantum computing. △ Less

Submitted 23 November, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

Comments: This is a technical report for the paper with the same name that appeared at PLDI'23

arXiv:2301.01890 [pdf, other]

Modular Mix-and-Match Complementation of Büchi Automata (Technical Report)

Authors: Vojtěch Havlena, Ondřej Lengál, Yong Li, Barbora Šmahlíková, Andrea Turrini

Abstract: Complementation of nondeterministic Büchi automata (BAs) is an important problem in automata theory with numerous applications in formal verification, such as termination analysis of programs, model checking, or in decision procedures of some logics. We build on ideas from a recent work on BA determinization by Li et al. and propose a new modular algorithm for BA complementation. Our algorithm all… ▽ More Complementation of nondeterministic Büchi automata (BAs) is an important problem in automata theory with numerous applications in formal verification, such as termination analysis of programs, model checking, or in decision procedures of some logics. We build on ideas from a recent work on BA determinization by Li et al. and propose a new modular algorithm for BA complementation. Our algorithm allows to combine several BA complementation procedures together, with one procedure for a subset of the BA's strongly connected components (SCCs). In this way, one can exploit the structure of particular SCCs (such as when they are inherently weak or deterministic) and use more efficient specialized algorithms, regardless of the structure of the whole BA. We give a general framework into which partial complementation procedures can be plugged in, and its instantiation with several algorithms. The framework can, in general, produce a complement with an Emerson-Lei acceptance condition, which can often be more compact. Using the algorithm, we were able to establish an exponentially better new upper bound of $O(4n)$ for complementation of the recently introduced class of elevator automata. We implemented the algorithm in a prototype and performed a comprehensive set of experiments on a large set of benchmarks, showing that our framework complements well the state of the art and that it can serve as a basis for future efficient BA complementation and inclusion checking algorithms. △ Less

Submitted 4 January, 2023; originally announced January 2023.

Comments: To appear in Proc. of TACAS'23

arXiv:2212.02317 [pdf, other]

Word Equations in Synergy with Regular Constraints (Technical Report)

Authors: František Blahoudek, Yu-Fang Chen, David Chocholatý, Vojtěch Havlena, Lukáš Holík, Ondřej Lengál, Juraj Síč

Abstract: When eating spaghetti, one should have the sauce and noodles mixed instead of eating them separately. We argue that also in string solving, word equations and regular constraints are better mixed together than approached separately as in most current string solvers. We propose a fast algorithm, complete for the fragment of chain-free constraints, in which word equations and regular constraints are… ▽ More When eating spaghetti, one should have the sauce and noodles mixed instead of eating them separately. We argue that also in string solving, word equations and regular constraints are better mixed together than approached separately as in most current string solvers. We propose a fast algorithm, complete for the fragment of chain-free constraints, in which word equations and regular constraints are tightly integrated and exchange information, efficiently pruning the cases generated by each other and limiting possible combinatorial explosion. The algorithm is based on a novel language-based characterisation of satisfiability of word equations with regular constraints. We experimentally show that our prototype implementation is competitive with the best string solvers and even superior in that it is the fastest on difficult examples and has the least number of timeouts. △ Less

Submitted 5 December, 2022; originally announced December 2022.

Comments: To appear in Proc. of FM'23

arXiv:2206.01946 [pdf, other]

Complementing Büchi Automata with Ranker (Technical Report)

Authors: Vojtěch Havlena, Ondřej Lengál, Barbora Šmahlíková

Abstract: We present the tool Ranker for complementing Büchi automata (BAs). Ranker builds on our previous optimizations of rank-based BA complementation and pushes them even further using numerous heuristics to produce even smaller automata. Moreover, it contains novel optimizations of specialized constructions for complementing (i) inherently weak automata and (ii) semi-deterministic automata, all deliver… ▽ More We present the tool Ranker for complementing Büchi automata (BAs). Ranker builds on our previous optimizations of rank-based BA complementation and pushes them even further using numerous heuristics to produce even smaller automata. Moreover, it contains novel optimizations of specialized constructions for complementing (i) inherently weak automata and (ii) semi-deterministic automata, all delivered in a robust tool. The optimizations significantly improve the usability of Ranker, as shown in an extensive experimental evaluation with real-world benchmarks, where Ranker produced in the majority of cases a strictly smaller complement than other state-of-the-art tools. △ Less

Submitted 4 June, 2022; originally announced June 2022.

arXiv:2205.12114 [pdf, other]

Register Set Automata (Technical Report)

Authors: Sabína Gulčíková, Ondřej Lengál

Abstract: We present register set automata (RsAs), a register automaton model over data words where registers can contain sets of data values and the following operations are supported: adding values to registers, clearing registers, and testing (non-)membership. We show that the emptiness problem for RsAs is decidable and complete for the $F_ω$ class. Moreover, we show that a large class of register automa… ▽ More We present register set automata (RsAs), a register automaton model over data words where registers can contain sets of data values and the following operations are supported: adding values to registers, clearing registers, and testing (non-)membership. We show that the emptiness problem for RsAs is decidable and complete for the $F_ω$ class. Moreover, we show that a large class of register automata can be transformed into deterministic RsAs, which can serve as a basis for (i) fast matching of a family of regular expressions with back-references and (ii) language inclusion algorithm for a sub-class of register automata. RsAs are incomparable in expressive power to other popular automata models over data words, such as alternating register automata and pebble automata. △ Less

Submitted 8 December, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

arXiv:2110.10187 [pdf, other]

Sky Is Not the Limit: Tighter Rank Bounds for Elevator Automata in Büchi Automata Complementation (Technical Report)

Authors: Vojtěch Havlena, Ondřej Lengál, Barbora Šmahlíková

Abstract: We propose several heuristics for mitigating one of the main causes of combinatorial explosion in rank-based complementation of Büchi automata (BAs): unnecessarily high bounds on the ranks of states. First, we identify elevator automata, which is a large class of BAs (generalizing semi-deterministic BAs), occurring often in practice, where ranks of states are bounded according to the structure of… ▽ More We propose several heuristics for mitigating one of the main causes of combinatorial explosion in rank-based complementation of Büchi automata (BAs): unnecessarily high bounds on the ranks of states. First, we identify elevator automata, which is a large class of BAs (generalizing semi-deterministic BAs), occurring often in practice, where ranks of states are bounded according to the structure of strongly connected components. The bounds for elevator automata also carry over to general BAs that contain elevator automata as a sub-structure. Second, we introduce two techniques for refining bounds on the ranks of BA states using data-flow analysis of the automaton. We implement out techniques as an extension of the tool Ranker for BA complementation and show that they indeed greatly prune the generated state space, obtaining significantly better results and outperforming other state-of-the-art tools on a large set of benchmarks. △ Less

Submitted 27 January, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

Comments: An extended version of a paper accepted to TACAS'22

arXiv:2010.07834 [pdf, other]

doi 10.4230/LIPIcs.CONCUR.2021.9

Reducing (to) the Ranks: Efficient Rank-based Büchi Automata Complementation (Technical Report)

Authors: Vojtěch Havlena, Ondřej Lengál

Abstract: This paper provides several optimizations of the rank-based approach for complementing Büchi automata. We start with Schewe's theoretically optimal construction and develop a set of techniques for pruning its state space that are key to obtaining small complement automata in practice. In particular, the reductions (except one) have the property that they preserve (at least some) so-called super-ti… ▽ More This paper provides several optimizations of the rank-based approach for complementing Büchi automata. We start with Schewe's theoretically optimal construction and develop a set of techniques for pruning its state space that are key to obtaining small complement automata in practice. In particular, the reductions (except one) have the property that they preserve (at least some) so-called super-tight runs, which are runs whose ranking is as tight as possible. Our evaluation on a large benchmark shows that the optimizations indeed significantly help the rank-based approach and that, in a large number of cases, the obtained complement is the smallest from those produced by a large number of state-of-the-art tools for Büchi complementation. △ Less

Submitted 21 July, 2021; v1 submitted 15 October, 2020; originally announced October 2020.

Comments: Accepted at CONCUR'21

arXiv:1910.01996 [pdf, ps, other]

Succinct Determinisation of Counting Automata via Sphere Construction (Technical Report)

Authors: Lukáš Holík, Ondřej Lengál, Olli Saarikivi, Lenka Turoňová, Margus Veanes, Tomáš Vojnar

Abstract: We propose an efficient algorithm for determinising counting automata (CAs), i.e., finite automata extended with bounded counters. The algorithm avoids unfolding counters into control states, unlike the naïve approach, and thus produces much smaller deterministic automata. We also develop a simplified and faster version of the general algorithm for the sub-class of so-called monadic CAs (MCAs), i.… ▽ More We propose an efficient algorithm for determinising counting automata (CAs), i.e., finite automata extended with bounded counters. The algorithm avoids unfolding counters into control states, unlike the naïve approach, and thus produces much smaller deterministic automata. We also develop a simplified and faster version of the general algorithm for the sub-class of so-called monadic CAs (MCAs), i.e., CAs with counting loops on character classes, which are common in practice. Our main motivation is (besides applications in verification and decision procedures of logics) the application of deterministic (M)CAs in pattern matching regular expressions with counting, which are very common in e.g. network traffic processing and log analysis. We have evaluated our algorithm against practical benchmarks from these application domains and concluded that compared to the naïve approach, our algorithm is much less prone to explode, produces automata that can be several orders of magnitude smaller, and is overall faster. △ Less

Submitted 4 October, 2019; originally announced October 2019.

Comments: An extended version of a paper accepted at APLAS'19

arXiv:1905.08697 [pdf, ps, other]

Automata Terms in a Lazy WSkS Decision Procedure (Technical Report)

Authors: Vojtěch Havlena, Lukáš Holík, Ondřej Lengál, Tomáš Vojnar

Abstract: We propose a lazy decision procedure for the logic WSkS. It builds a term-based symbolic representation of the state space of the tree automaton (TA) constructed by the classical WSkS decision procedure. The classical decision procedure transforms the symbolic representation into a TA via a bottom-up traversal and then tests its language non-emptiness, which corresponds to satisfiability of the fo… ▽ More We propose a lazy decision procedure for the logic WSkS. It builds a term-based symbolic representation of the state space of the tree automaton (TA) constructed by the classical WSkS decision procedure. The classical decision procedure transforms the symbolic representation into a TA via a bottom-up traversal and then tests its language non-emptiness, which corresponds to satisfiability of the formula. On the other hand, we start evaluating the representation from the top, construct the state space on the fly, and utilize opportunities to prune away parts of the state space irrelevant to the language emptiness test. In order to do so, we needed to extend the notion of language terms (denoting language derivatives) used in our previous procedure for the linear fragment of the logic (the so-called WS1S) into automata terms. We implemented our decision procedure and identified classes of formulae on which our prototype implementation is significantly faster than the classical procedure implemented in the Mona tool. △ Less

Submitted 21 May, 2019; originally announced May 2019.

arXiv:1905.07139 [pdf, other]

Simulations in Rank-Based Büchi Automata Complementation (Technical Report)

Authors: Yu-Fang Chen, Vojtěch Havlena, Ondřej Lengál

Abstract: Complementation of Büchi automata is an essential technique used in some approaches for termination analysis of programs. The long search for an optimal complementation construction climaxed with the work of Schewe, who proposed a worst-case optimal rank-based procedure that generates complements of a size matching the theoretical lower bound of $(0.76n)^n$, modulo a polynomial factor of $O(n^2)$.… ▽ More Complementation of Büchi automata is an essential technique used in some approaches for termination analysis of programs. The long search for an optimal complementation construction climaxed with the work of Schewe, who proposed a worst-case optimal rank-based procedure that generates complements of a size matching the theoretical lower bound of $(0.76n)^n$, modulo a polynomial factor of $O(n^2)$. Although worst-case optimal, the procedure in many cases produces automata that are unnecessarily large. In this paper, we propose several ways of how to use the direct and delayed simulation relations to reduce the size of the automaton obtained in the rank-based complementation procedure. Our techniques are based on either (i) ignoring macrostates that cannot be used for accepting a word in the complement or (ii) saturating macrostates with simulation-smaller states, in order to decrease their total number. We experimentally showed that our techniques can indeed considerably decrease the size of the output of the complementation. △ Less

Submitted 4 October, 2019; v1 submitted 17 May, 2019; originally announced May 2019.

Comments: An extended version of a paper to appear in Proc. of APLAS'19

arXiv:1904.10786 [pdf, other]

Deep Packet Inspection in FPGAs via Approximate Nondeterministic Automata

Authors: Milan Češka, Vojtěch Havlena, Lukáš Holík, Jan Kořenek, Ondřej Lengál, Denis Matoušek, Jiří Matoušek, Jakub Semrič, Tomáš Vojnar

Abstract: Deep packet inspection via regular expression (RE) matching is a crucial task of network intrusion detection systems (IDSes), which secure Internet connection against attacks and suspicious network traffic. Monitoring high-speed computer networks (100 Gbps and faster) in a single-box solution demands that the RE matching, traditionally based on finite automata (FAs), is accelerated in hardware. In… ▽ More Deep packet inspection via regular expression (RE) matching is a crucial task of network intrusion detection systems (IDSes), which secure Internet connection against attacks and suspicious network traffic. Monitoring high-speed computer networks (100 Gbps and faster) in a single-box solution demands that the RE matching, traditionally based on finite automata (FAs), is accelerated in hardware. In this paper, we describe a novel FPGA architecture for RE matching that is able to process network traffic beyond 100 Gbps. The key idea is to reduce the required FPGA resources by leveraging approximate nondeterministic FAs (NFAs). The NFAs are compiled into a multi-stage architecture starting with the least precise stage with a high throughput and ending with the most precise stage with a low throughput. To obtain the reduced NFAs, we propose new approximate reduction techniques that take into account the profile of the network traffic. Our experiments showed that using our approach, we were able to perform matching of large sets of REs from SNORT, a popular IDS, on unprecedented network speeds. △ Less

Submitted 24 April, 2019; originally announced April 2019.

Comments: In Proceedings of FCCM'19

arXiv:1807.08487 [pdf, ps, other]

Simulation Algorithms for Symbolic Automata (Technical Report)

Authors: Lukáš Holík, Ondřej Lengál, Juraj Síč, Margus Veanes, Tomáš Vojnar

Abstract: We investigate means of efficient computation of the simulation relation over symbolic finite automata (SFAs), i.e., finite automata with transitions labeled by predicates over alphabet symbols. In one approach, we build on the algorithm by Ilie, Navaro, and Yu proposed originally for classical finite automata, modifying it using the so-called mintermisation of the transition predicates. This solu… ▽ More We investigate means of efficient computation of the simulation relation over symbolic finite automata (SFAs), i.e., finite automata with transitions labeled by predicates over alphabet symbols. In one approach, we build on the algorithm by Ilie, Navaro, and Yu proposed originally for classical finite automata, modifying it using the so-called mintermisation of the transition predicates. This solution, however, generates all Boolean combinations of the predicates, which easily causes an exponential blowup in the number of transitions. Therefore, we propose two more advanced solutions. The first one still applies mintermisation but in a local way, mitigating the size of the exponential blowup. The other one focuses on a novel symbolic way of dealing with transitions, for which we need to sacrifice the counting technique of the original algorithm (counting is used to decrease the dependency of the running time on the number of transitions from quadratic to linear). We perform a thorough experimental evaluation of all the algorithms, together with several further alternatives, showing that all of them have their merits in practice, but with the clear indication that in most of the cases, efficient treatment of symbolic transitions is more beneficial than counting. △ Less

Submitted 27 July, 2018; v1 submitted 23 July, 2018; originally announced July 2018.

Comments: To appear in ATVA'18

arXiv:1710.10756 [pdf, other]

Fair Termination for Parameterized Probabilistic Concurrent Systems (Technical Report)

Authors: Ondrej Lengal, Anthony W. Lin, Rupak Majumdar, Philipp Ruemmer

Abstract: We consider the problem of automatically verifying that a parameterized family of probabilistic concurrent systems terminates with probability one for all instances against adversarial schedulers. A parameterized family defines an infinite-state system: for each number n, the family consists of an instance with n finite-state processes. In contrast to safety, the parameterized verification of live… ▽ More We consider the problem of automatically verifying that a parameterized family of probabilistic concurrent systems terminates with probability one for all instances against adversarial schedulers. A parameterized family defines an infinite-state system: for each number n, the family consists of an instance with n finite-state processes. In contrast to safety, the parameterized verification of liveness is currently still considered extremely challenging especially in the presence of probabilities in the model. One major challenge is to provide a sufficiently powerful symbolic framework. One well-known symbolic framework for the parameterized verification of non-probabilistic concurrent systems is regular model checking. Although the framework was recently extended to probabilistic systems, incorporating fairness in the framework - often crucial for verifying termination - has been especially difficult due to the presence of an infinite number of fairness constraints (one for each process). Our main contribution is a systematic, regularity-preserving, encoding of finitary fairness (a realistic notion of fairness proposed by Alur & Henzinger) in the framework of regular model checking for probabilistic parameterized systems. Our encoding reduces termination with finitary fairness to verifying parameterized termination without fairness over probabilistic systems in regular model checking (for which a verification framework already exists). We show that our algorithm could verify termination for many interesting examples from distributed algorithms (Herman's protocol) and evolutionary biology (Moran process, cell cycle switch), which do not hold under the standard notion of fairness. To the best of our knowledge, our algorithm is the first fully-automatic method that can prove termination for these examples. △ Less

Submitted 29 October, 2017; originally announced October 2017.

Comments: A technical report of a TACAS'17 paper

arXiv:1710.08647 [pdf, ps, other]

Approximate Reduction of Finite Automata for High-Speed Network Intrusion Detection (Technical Report)

Authors: Milan Ceska, Vojtech Havlena, Lukas Holik, Ondrej Lengal, Tomas Vojnar

Abstract: We consider the problem of approximate reduction of non-deterministic automata that appear in hardware-accelerated network intrusion detection systems (NIDSes). We define an error distance of a reduced automaton from the original one as the probability of packets being incorrectly classified by the reduced automaton (wrt the probabilistic distribution of packets in the network traffic). We use thi… ▽ More We consider the problem of approximate reduction of non-deterministic automata that appear in hardware-accelerated network intrusion detection systems (NIDSes). We define an error distance of a reduced automaton from the original one as the probability of packets being incorrectly classified by the reduced automaton (wrt the probabilistic distribution of packets in the network traffic). We use this notion to design an approximate reduction procedure that achieves a great size reduction (much beyond the state-of-the-art language-preserving techniques) with a controlled and small error. We have implemented our approach and evaluated it on use cases from Snort, a popular NIDS. Our results provide experimental evidence that the method can be highly efficient in practice, allowing NIDSes to follow the rapid growth in the speed of networks. △ Less

Submitted 21 February, 2018; v1 submitted 24 October, 2017; originally announced October 2017.

Comments: An extended version of a paper accepted at TACAS'18

arXiv:1704.03972 [pdf, ps, other]

Register automata with linear arithmetic

Authors: Yu-Fang Chen, Ondrej Lengal, Tony Tan, Zhilin Wu

Abstract: We propose a novel automata model over the alphabet of rational numbers, which we call register automata over the rationals (RA-Q). It reads a sequence of rational numbers and outputs another rational number. RA-Q is an extension of the well-known register automata (RA) over infinite alphabets, which are finite automata equipped with a finite number of registers/variables for storing values. Like… ▽ More We propose a novel automata model over the alphabet of rational numbers, which we call register automata over the rationals (RA-Q). It reads a sequence of rational numbers and outputs another rational number. RA-Q is an extension of the well-known register automata (RA) over infinite alphabets, which are finite automata equipped with a finite number of registers/variables for storing values. Like in the standard RA, the RA-Q model allows both equality and ordering tests between values. It, moreover, allows to perform linear arithmetic between certain variables. The model is quite expressive: in addition to the standard RA, it also generalizes other well-known models such as affine programs and arithmetic circuits. The main feature of RA-Q is that despite the use of linear arithmetic, the so-called invariant problem---a generalization of the standard non-emptiness problem---is decidable. We also investigate other natural decision problems, namely, commutativity, equivalence, and reachability. For deterministic RA-Q, commutativity and equivalence are polynomial-time inter-reducible with the invariant problem. △ Less

Submitted 17 May, 2017; v1 submitted 12 April, 2017; originally announced April 2017.

ACM Class: F.1.1; F.4.3

arXiv:1702.02439 [pdf, ps, other]

An Executable Sequential Specification for Spark Aggregation

Authors: Yu-Fang Chen, Chih-Duo Hong, Ondřej Lengál, Shin-Cheng Mu, Nishant Sinha, Bow-Yaw Wang

Abstract: Spark is a new promising platform for scalable data-parallel computation. It provides several high-level application programming interfaces (APIs) to perform parallel data aggregation. Since execution of parallel aggregation in Spark is inherently non-deterministic, a natural requirement for Spark programs is to give the same result for any execution on the same data set. We present PureSpark, an… ▽ More Spark is a new promising platform for scalable data-parallel computation. It provides several high-level application programming interfaces (APIs) to perform parallel data aggregation. Since execution of parallel aggregation in Spark is inherently non-deterministic, a natural requirement for Spark programs is to give the same result for any execution on the same data set. We present PureSpark, an executable formal Haskell specification for Spark aggregate combinators. Our specification allows us to deduce the precise condition for deterministic outcomes from Spark aggregation. We report case studies analyzing deterministic outcomes and correctness of Spark programs. △ Less

Submitted 8 February, 2017; originally announced February 2017.

Comments: an extended version of a paper accepted at NETYS'17

arXiv:1701.06282 [pdf, other]

Lazy Automata Techniques for WS1S

Authors: Tomáš Fiedor, Lukáš Holík, Petr Janků, Ondřej Lengál, Tomáš Vojnar

Abstract: We present a new decision procedure for the logic WS1S. It originates from the classical approach, which first builds an automaton accepting all models of a formula and then tests whether its language is empty. The main novelty is to test the emptiness on the fly, while constructing a symbolic, term-based representation of the automaton, and prune the constructed state space from parts irrelevant… ▽ More We present a new decision procedure for the logic WS1S. It originates from the classical approach, which first builds an automaton accepting all models of a formula and then tests whether its language is empty. The main novelty is to test the emptiness on the fly, while constructing a symbolic, term-based representation of the automaton, and prune the constructed state space from parts irrelevant to the test. The pruning is done by a generalization of two techniques used in antichain-based language inclusion and universality checking of finite automata: subsumption and early termination. The richer structure of the WS1S decision problem allows us, however, to elaborate on these techniques in novel ways. Our experiments show that the proposed approach can in many cases significantly outperform the classical decision procedure (implemented in the MONA tool) as well as recently proposed alternatives. △ Less

Submitted 24 January, 2017; v1 submitted 23 January, 2017; originally announced January 2017.

Comments: Technical Report for a paper to be published in TACAS'17

arXiv:1511.00754 [pdf, ps, other]

PAC Learning-Based Verification and Model Synthesis

Authors: Yu-Fang Chen, Chiao Hsieh, Ondřej Lengál, Tsung-Ju Lii, Ming-Hsien Tsai, Bow-Yaw Wang, Farn Wang

Abstract: We introduce a novel technique for verification and model synthesis of sequential programs. Our technique is based on learning a regular model of the set of feasible paths in a program, and testing whether this model contains an incorrect behavior. Exact learning algorithms require checking equivalence between the model and the program, which is a difficult problem, in general undecidable. Our lea… ▽ More We introduce a novel technique for verification and model synthesis of sequential programs. Our technique is based on learning a regular model of the set of feasible paths in a program, and testing whether this model contains an incorrect behavior. Exact learning algorithms require checking equivalence between the model and the program, which is a difficult problem, in general undecidable. Our learning procedure is therefore based on the framework of probably approximately correct (PAC) learning, which uses sampling instead and provides correctness guarantees expressed using the terms error probability and confidence. Besides the verification result, our procedure also outputs the model with the said correctness guarantees. Obtained preliminary experiments show encouraging results, in some cases even outperforming mature software verifiers. △ Less

Submitted 2 November, 2015; originally announced November 2015.

Comments: 11 pages

arXiv:1501.03849 [pdf, other]

Nested Antichains for WS1S

Authors: Tomas Fiedor, Lukas Holik, Ondrej Lengal, Tomas Vojnar

Abstract: We propose a novel approach for coping with alternating quantification as the main source of nonelementary complexity of deciding WS1S formulae. Our approach is applicable within the state-of-the-art automata-based WS1S decision procedure implemented, e.g. in MONA. The way in which the standard decision procedure processes quantifiers involves determinization, with its worst case exponential compl… ▽ More We propose a novel approach for coping with alternating quantification as the main source of nonelementary complexity of deciding WS1S formulae. Our approach is applicable within the state-of-the-art automata-based WS1S decision procedure implemented, e.g. in MONA. The way in which the standard decision procedure processes quantifiers involves determinization, with its worst case exponential complexity, for every quantifier alternation in the prefix of a formula. Our algorithm avoids building the deterministic automata---instead, it constructs only those of their states needed for (dis)proving validity of the formula. It uses a symbolic representation of the states, which have a deeply nested structure stemming from the repeated implicit subset construction, and prunes the search space by a nested subsumption relation, a generalization of the one used by the so-called antichain algorithms for handling nondeterministic automata. We have obtained encouraging experimental results, in some cases outperforming MONA by several orders of magnitude. △ Less

Submitted 15 January, 2015; originally announced January 2015.

Comments: Accepted to TACAS'15

Report number: FIT-TR-2014-06

arXiv:1304.5806 [pdf, ps, other]

Fully Automated Shape Analysis Based on Forest Automata

Authors: Lukas Holik, Ondrej Lengal, Adam Rogalewicz, Jiri Simacek, Tomas Vojnar

Abstract: Forest automata (FA) have recently been proposed as a tool for shape analysis of complex heap structures. FA encode sets of tree decompositions of heap graphs in the form of tuples of tree automata. In order to allow for representing complex heap graphs, the notion of FA allowed one to provide user-defined FA (called boxes) that encode repetitive graph patterns of shape graphs to be used as alphab… ▽ More Forest automata (FA) have recently been proposed as a tool for shape analysis of complex heap structures. FA encode sets of tree decompositions of heap graphs in the form of tuples of tree automata. In order to allow for representing complex heap graphs, the notion of FA allowed one to provide user-defined FA (called boxes) that encode repetitive graph patterns of shape graphs to be used as alphabet symbols of other, higher-level FA. In this paper, we propose a novel technique of automatically learning the FA to be used as boxes that avoids the need of providing them manually. Further, we propose a significant improvement of the automata abstraction used in the analysis. The result is an efficient, fully-automated analysis that can handle even as complex data structures as skip lists, with the performance comparable to state-of-the-art fully-automated tools based on separation logic, which, however, specialise in dealing with linked lists only. △ Less

Submitted 21 April, 2013; originally announced April 2013.

Comments: Accepted to CAV'13

arXiv:1204.3240 [pdf, other]

An Efficient Finite Tree Automata Library

Authors: Ondřej Lengál

Abstract: Numerous computer systems use dynamic control and data structures of unbounded size. These data structures have often the character of trees or they can be encoded as trees with some additional pointers. This is exploited by some currently intensively studied techniques of formal verification that represent an infinite number of states using a finite tree automaton. However, currently there is no… ▽ More Numerous computer systems use dynamic control and data structures of unbounded size. These data structures have often the character of trees or they can be encoded as trees with some additional pointers. This is exploited by some currently intensively studied techniques of formal verification that represent an infinite number of states using a finite tree automaton. However, currently there is no tree automata library implementation that would provide an efficient and flexible support for such methods. Thus the aim of this Master's Thesis is to provide such a library. The present paper first describes the theoretical background of finite tree automata and regular tree languages. Then it surveys the current implementations of tree automata libraries and studies various verification techniques, outlining requirements for the library. Representation of a finite tree automaton and algorithms that perform standard language operations on this representation are proposed in the next part, which is followed by description of library implementation. Through a series of experiments it is shown that the library can compete with other available tree automata libraries, in certain areas being even significantly superior to them. △ Less

Submitted 15 April, 2012; originally announced April 2012.

Comments: Master's thesis

Showing 1–25 of 25 results for author: Lengál, O