Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Explanatory Denotational Semantics for Complex Event Patterns

Published: 20 November 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Recent years brought popularity and importance of complex event processing (CEP) and associated query languages. CEP systems can be hard to understand. It is often non-trivial to determine streams of events matched by a query, and sometimes we may not notice important edge cases. Hence, the desirability of formal semantics permitting reasoning about resulting complex events and checking if actual matchings agree with our intentions follows. In the article, we introduce a pattern language PatLang with some unique syntactic features related to variable binding. We provide two distinct denotational semantics for PatLang: Minimal semantics, sufficient to describe when patterns match, and tree semantics, which provides detailed information about subpatterns with which the matched events actually match, i.e., information about interpretation of matched events induced by the pattern matching. The tree semantics is unnecessary for verifying correctness of pattern matching execution. However, we show that neither minimal semantics, nor semantics from the prior work suffices to effectively locate errors in patterns with respect to their intended meaning, and that the additional information provided by the tree semantics is crucial for that purpose. We prove that tree semantics can be mapped to minimal semantics. Finally, we provide some practical evaluation.

    1 Introduction

    Many organisations and enterprises produce large amounts of data which need to be processed in real time to alert the stakeholders about dangers, actions which must be taken, or business opportunities. Examples include monitoring credit card transactions to discover fraud, transportation and logistics (tracking of passengers and goods, discovering bottlenecks, misaligned flights or lost luggage), automatic stock trading, wireless sensor networks, and so on. (see e.g., [10, 24]). This data in the form of event stream, nowadays, is often fed into a complex event processing (CEP) system (see e.g., [2, 4, 10, 28]). CEP systems recognize patterns of events in the stream and emit higher order events which may provide direct alert to the operator or be fed back into the event stream. Patterns and output events (i.e., queries) are specified using query languages of varying complexity and expressiveness (see e.g., [2, 6, 9, 16, 19, 20, 26, 29]). Most CEP systems use non-deterministic automata to both evaluate queries and to provide semantics for the queries (see e.g., [6, 11, 14, 28, 29]). Alternatives include tree-based models (see e.g., [26]) with patterns defining trees of event operators, and logic-based models such as chronicle recognition systems ([12]), Prolog ([1]) and event calculus ([3]) based on fluents ([22]).
    CEP systems can easily become overwhelming and hard to understand. In particular, it is often non-trivial to ascertain which streams of events will be matched by a given query (even if the query is expressed in a high-level language) and in some cases our intuitions may not notice important edge cases. Hence, it is desirable to have high-level query languages with robust formal semantics (see e.g., [7, 18]) which permit reasoning about resulting complex events and verification whether actual matchings agree with the intended meaning of the given pattern. In this article, we introduce a somewhat “regular expression-like” pattern language called PatLang. The language was directly inspired by [18, 19]. It bears some similarity to both the first order and second order languages described in the aforementioned articles. However, PatLang implements also some additional pattern constructors and has some unique syntactic features pertaining to variable binding (particularly scoping and shadowing rules we explain later in detail) which we believe to be interesting on their own (although the binding rules in PatLang are original and look exotic when compared to, say, first order logic, they bear some similarity to binding rules in dynamic predicate logic (DPL) [21] and Neo4J query language Cypher [15]). More importantly, PatLang has two distinct, novel denotational semantics for this language. One, called minimal semantics, is as simple as possible to be still able to sufficiently describe when a pattern matches, and the induced bindings of variables. The other one, called tree semantics provides detailed information about parts of the stream matching with subpatterns of the given pattern, i.e., information about interpretation of events in the event stream induced by the pattern matching. This tree semantics is unnecessary for verifying correctness of pattern matching execution. However, we demonstrate using practical examples that neither minimal semantics, nor a more complex semantics from the prior work is sufficient for effectively locating errors in certain patterns (with respect to their intended meaning), and that the additional information provided by the tree semantics is crucial for that purpose. We also prove that one can transform tree semantics into minimal semantics by essentially dropping the additional information.
    Both denotational semantics of PatLang are distinct from [18, 19, 20], as well (to the best of our knowledge) from other semantics in the prior work. [18] was the first attempt to construct denotational semantics for compositional patterns, where the prior work used either execution semantics [28] or semantics based on logic [1]. The article was followed by [19, 20], where the second order pattern language was introduced. PatLang and its semantics differ from [18, 19, 20] in the following ways:
    The purpose of [18] was to provide theoretical foundation for the analysis of complex event systems, and hence both the pattern language considered as well as its semantics had to be as minimalistic as possible to aid the development of mathematical results and proofs. In contrast, we want to have a complete and practical language to be ultimately implemented in both an event processing system and associated verification tool (see [25]). Thus, [18, 19] keep filtering predicates and (in case of [19]) aggregation functions abstract. This makes presentation simple, and provides a nice separation of concerns. Here, however, the syntax of PatLang includes concrete syntax for predicates and aggregates.
    Neither of the languages presented in [18, 19] includes explicit bounded negation pattern constructors (although the second order language in [19] can, in principle, represent some negated patterns indirectly). The language in [20] has bounded negation of the form ϕ unless ψ which matches what \(\phi\) matches unless \(\psi\) also matches. PatLang includes bounded negation of the form \(\phi \mathbin {\small {noskip}}\psi\) which forbids skipping events matched by \(\psi\) ( \(\psi\) matches only single events) while matching \(\phi\) . Note that this form of bounded negation is distinct from ϕ unless ψ in [20] (even with \(\psi\) limited to patterns matching single events) since ϕ unless ψ forbids events matched by \(\psi\) whether they are among events matched by \(\phi\) or not. It is easy to see that _ \(\mathbin {\small {noskip}}\) _ operator can be used to implement strict selection strategy as well as contiguous sequencing and iteration from [20]. Bounded negation influences strongly both semantics of PatLang we present here.
    We also included explicit time handling and aggregation, absent from [18, 19].
    Matching patterns for events are founded on simple patterns of the form \(R\mathbin {\small {as}}x\) which match a single event of type \(R\) and bind it to the variable \(x\) . However, in [18, 19] the meaning \(⟦ R\mathbin {\small {as}}x⟧ _{\nu , \Gamma }\) of this pattern, is defined relative to the substitution \(\nu\) which must assign a position in a stream \(\Gamma\) of matched event to \(x\) , i.e., in the pattern \(R\mathbin {\small {as}}x\) variable \(x\) is treated by semantics as free. This is reasonable in the context of [18] where the only role for the pattern was to match a stream fragment and it simplifies the presentation. However, it is unnatural if we want the pattern to be the source of variable bindings used in output term constructs. Hence, in both semantics presented here, \(R\mathbin {\small {as}}x\) is treated as a binding construct, and we provide a precise description of its scope.
    The semantics in [18] assigns to patterns the sets of positions in the stream, each set corresponding to the complex event matched by the pattern. It ignores, however, information about which part of the pattern a matched stream fragment corresponds to (it was understandable in the context of the approach presented in [18]). This problem does not apply to the same degree in case of second order language described in [19], where subpatterns can be captured in variables, but it is still there. However, as we show in this article such information is useful when verifying correctness of the patterns. On the other hand, information about which events were matched may be excessive when we want to verify the correctness of implementation: in this case, we care only about which finite subsequence of events triggered the matching, and what is the associated binding of variables. Accordingly, to satisfy contradictory requirements of various potential applications we provide two denotational semantics: a minimal one which associates to a pattern (and an event stream) a set of intervals of positions in the stream and associated variable bindings, and tree semantics which assigns to the pattern (and a stream) set of trees. The leaves store positions of the matched events in the input stream. The internal nodes correspond to positions in the input pattern the descendant leaves match. Thus, tree semantics provides richer information which includes roles of matched events.
    In [19, 20] event variables are bound to sets of event positions instead of single events/positions. This simplifies greatly treatment of iteration (and aggregation). Also, it allows in [20] to replace variable shadowing with taking the union of competing variable values which again simplifies greatly the rules. While we appreciate the simplicity and elegance of this formalism, we nevertheless decided to use a more traditional approach with single-event variables and explicit aggregation since we believe it is more intuitively understandable for potential users. Also, as we show in Example 7.3, while the auxiliary semantics of [20] can preserve in many cases the information about the part of the pattern where a given event was matched (as our tree semantics does), in general, this information is lost.
    Finally, we do not use selection strategies [18, 20] although, as remarked above, strict selection strategy can be implemented (and with finer control) using \(\mathbin {\small {noskip}}\) operator in PatLang.
    The proof-of-concept version of a pattern exploration tool based on PatLang and its tree semantics can be downloaded from GitHub [30].

    1.1 Outline of the Article

    In Section 2, we introduce (using examples) both the pattern language and its informal semantics. We also show the full query language (not referred to in subsequent sections) to demonstrate how PatLang is intended to be used. Finally, the examples demonstrate the difficulty of writing the pattern which matches all intended complex events and nothing more.
    Section 3 contains preliminaries on lists, substitutions and event schemas.
    In Section 4, we introduce the syntax of event patterns. In particular, we describe formally and in detail the binding structure of the patterns.
    In Sections 5 and 6, we describe, respectively, minimal and tree semantics of event patterns in PatLang. In order to be able to define minimal semantics we also define an additional semantics which is essentially a reformulation and adaptation of semantics from [18] to PatLang and our approach to binding variables. Finally, we prove that the minimal semantics is an abstraction via this additional semantics of the tree semantics.
    Section 7 contains examples showing the need for additional details provided by tree semantics to verify patterns with respect to their intended meaning.
    Finally, Section 8 contains a discussion of complexity issues, a description of the pattern exploration tool we created, and some experimental results.

    2 Motivating Example

    Formalization of syntax and semantics of PatLang requires a lot of rather dry definitions and lemmas bound to bog down the reader with boring details unless some high-level, semi-formal overview of what we intend to achieve is provided first. Accordingly, we devote this section to a concrete example (versions of which frequently appear in the literature, see e.g., [28]) of a CEP system which processes RFID sensor data in a store. Such sensor data can be used to prevent shoplifting, misplacement of wares, spoiling because of excess storage temperature, and so on. We use this example to introduce informally both the syntax and semantics of PatLang. We begin by introducing types of events used in the example. Events are represented here by pairs consisting of a type (like a name of a relational variable) and a relational record, where the type determines the set of attributes. We assume that for each event type the set of attributes includes timestamp \(\tau\) (which tells us when the event happened). We consider timestamps to be represented by real numbers (e.g., interpreted as the number of seconds from some fixed moment).
    An event schema \(\mathcal {R}\) is a set of declarations of the form \(f(A_0,\ldots , A_n)\) , where \(f\) is an event type and \(A_0\) , \(\ldots\) , \(A_n\) are the associated attributes. Given \(f(A_0,\ldots , A_n)\in \mathcal {R}\) we write the record of type \(f\) such that the attribute \(A_i\) has value \(v_i\) for \(i\in \lbrace 0,\ldots , n\rbrace\) as \(f(v_0,\ldots ,v_n)\) . We denote the set of events in the event schema \(\mathcal {R}\) by \(\small{Events}_\mathcal {R}\) . In this section, we consider the event schema \(\mathcal {R}\) consisting of the following declarations:
    \(\mathit {put}(\mathit {id},\mathit {sid}, \mathit {temp}, \tau)\) —an item with identifier \(\mathit {id}\) and preferred storage temperature equal to \(\mathit {temp}\) was put on the shelf identified by \(\mathit {sid}\) at time \(\tau\) ,
    \(\mathit {get}(\mathit {id},\mathit {sid}, \mathit {temp}, \tau)\) —an item was taken from the shelf at \(\tau\) . The other attributes have the same meaning as in case of \(\mathit {put}\) -events,
    \(\mathit {pay}(\mathit {id}, \tau)\) —the customer paid for item \(\mathit {id}\) at time \(\tau\) ,
    \(\mathit {out}(\mathit {id}, \tau)\) —the item \(\mathit {id}\) left the shop at time \(\tau\) ,
    \(\mathit {shelf}(\mathit {sid},\mathit {temp}, \tau)\) —the shelf \(\mathit {sid}\) reported temperature \(\mathit {temp}\) at time \(\tau\) .
    A stream of events is properly a mapping \(\Gamma :\mathbb {N}\rightarrow \small{Events}_\mathcal {R}\) . Thus, a stream is infinite “in the direction of future”. When writing finite fragments of a stream \(\Gamma\) we will write it as a list of its values for an interval of natural numbers, e.g., \([\Gamma (m),\Gamma (m+1),\ldots ,\Gamma (m+n-1), \Gamma (m+n)]\) .
    Example 2.1.
    The following finite fragment of a stream of events in the schema \(\mathcal {R}\) indicates the possibility of shoplifting: “the item with identifier 1 was taken from the shelf but never put back”. It also indicates the possibility of spoiling the item with identifier 2 due to it being moved to a too warm (and still warming up) shelf.
    \(\begin{multline*} [\mathit {shelf}(0,10.5, 3),\mathit {put}(2, 0, 10.0, 4),\mathit {get}(1,7, 18.0, 7),\\ \mathit {shelf}(0,11.5, 10),\mathit {shelf}(0,12.0, 15),\mathit {shelf}(0,18.5, 20), \mathit {out}(1,25)]. \end{multline*}\)
    It is obvious for humans that the stream fragment above can be interpreted as in the example. However, in order for a CEP system to recognize the respective situations and issue the appropriate alert, we have to specify formally the indicators of shoplifting and the possibility of spoilage. A declarative way of doing that is to use special purpose query or pattern language. The language PatLang introduced in this article is related to regular expressions and was heavily inspired by a combination of [18] and [19].
    Example 2.2.
    The following query matches cases of shoplifting and constructs the complex event (with schema \(\mathit {theft}(\mathit {id}, \mathit {sid}, \mathit {time}\) _ \(\mathit {taken}, \tau)\) alerting to the theft of item identified by \(\mathit {id}\) taken by thief from shelf \(\mathit {sid}\) at \(\mathit {time}\) _ \(\mathit {taken}\) , and taken out of shop without paying at \(\tau\) . The latter becomes the timestamp of the whole complex event, since this is the moment when legitimate taking the item from the shelf by a customer or an employee becomes actual theft.
    \(\begin{align} &\small{SELECT}\quad \mathit {theft}(x\mathbin {.}\mathit {id}, x\mathbin {.}\mathit {sid}, x\mathbin {.}\tau ,y\mathbin {.}\tau)\nonumber \nonumber\\ &\small{FROM}\quad (\mathit {get} \mathbin {\small {as}}x)\mathbin {;}((\mathit {out}\mathbin {\small {as}}y \mathbin {\small {filter}}x\mathbin {.}\mathit {id}=y\mathbin {.}\mathit {id})\mathbin {\small {noskip}}(\nonumber \nonumber\\ & \quad \quad \quad \quad \quad \quad (\mathit {pay}\mathbin {\small {as}}p \mathbin {\small {filter}}p\mathbin {.}\mathit {id}=x\mathbin {.}\mathit {id})\mathbin {\small {or}}(\mathit {put}\mathbin {\small {as}}z \mathbin {\small {filter}}z\mathbin {.}\mathit {id}=x\mathbin {.}\mathit {id})\nonumber \nonumber\\ &\quad \quad \quad \quad)). \end{align}\)
    (1)
    Let us explain the details of the above query. First, note its high-level structure:
    \(\begin{equation*} \small{SELECT}\quad {output_ event}\quad \small{FROM}\quad {input_ pattern} \end{equation*}\)
    where output_event is an expression which constructs a complex event that is a “result” of this query, and input_pattern is a pattern which matches part of the queried stream and binds some variables to events which can then be used in output_event expression to construct the output. After this section, we will be concerned only with the input_pattern sub-language.
    Now, what are the indications of theft? Answer: an item taken from the shelf and then out of the shop without being paid for in between. So our first attempt at pattern recognizing theft might be as follows:
    \(\begin{equation} (\mathit {get} \mathbin {\small {as}}x)\mathbin {;}(\mathit {out}\mathbin {\small {as}}y)\mathbin {\small {filter}}x\mathbin {.}\mathit {id}=y\mathbin {.}\mathit {id}. \end{equation}\)
    (2)
    \(R\mathbin {\small {as}}x\) matches any event of type \(R\) and binds its value to a variable \(x\) . Thus, e.g., \(\mathit {out}\mathbin {\small {as}}y\) matches an event of type \(\mathit {out}\) emitted when some item is taken out of the shop. Pattern \(\phi \mathbin {\small {filter}}P\) matches whatever \(\phi\) matches, but only if condition \(P\) is satisfied. In the above pattern, it is used to ensure, by comparing the \(\mathit {id}\) ’s of the matched events, that the item taken from the shelf (according to the event matched with \(\mathit {get} \mathbin {\small {as}}x\) ) is the same item which left the shop (according to the event matched with \(\mathit {out}\mathbin {\small {as}}y\) ). Semicolon is a sequencing operator: \(\phi \mathbin {;}\psi\) means “first match \(\phi\) somewhere in the stream, and then match \(\psi\) somewhere in the part of the stream which comes after the last event matched by \(\phi\) ” (see Figure 1). Thus, between \(\phi\) and \(\psi\) in the stream, there can be any number of other events. We usually do not care about unrelated events in the stream between those we care about. Hence, any reasonable interpretation of the sequencing operator implicitly skips them. This, unfortunately, means that our attempt at discovering theft in Equation (2) is incorrect: Between taking an item off the shelf and taking it out of the shop there could be many events in the stream, including the one indicating payment for the item. To ensure that between the two matched events there is no related payment event we use skip restricting operator:
    \(\begin{equation} (\mathit {get} \mathbin {\small {as}}x)\mathbin {;}((\mathit {out}\mathbin {\small {as}}y \mathbin {\small {filter}}x\mathbin {.}\mathit {id}=y\mathbin {.}\mathit {id})\mathbin {\small {noskip}}(\mathit {pay}\mathbin {\small {as}}z\mathbin {\small {filter}}x\mathbin {.}\mathit {id}=z\mathbin {.}\mathit {id})). \end{equation}\)
    (3)
    It is not practical to demand that a pattern cannot appear in the stream since streams are infinite. Thus, we do not consider a pattern like \(\mathsf {Not}\;\phi\) matching when the stream does not match \(\phi\) . Instead, all practical negation-like patterns must select a finite window in the stream to which they limit the search for the undesirable pattern. Also, one can note that negation is desirable in event patterns mostly because any event can be implicitly skipped by the pattern processor (unlike in, say, usual regular expressions where the pattern accounts for all characters). Hence, we decided to use only a minimalist form of bounded negation in the form of \(\phi \mathbin {\small {noskip}}\psi\) pattern which matches part \(L\) of the stream matched by \(\phi\) if \(L\) does not contain any events matched by \(\psi\) which were skipped when matching with \(\phi\) . Pattern \(\psi\) must match a single event. In the concrete case of pattern in Equation (3) this means that the pattern matches stream fragment of the form
    \(\begin{equation*} \ldots \mathit {get}(\mathit {id},\mathit {sid}, \mathit {temp}, \tau _1), e_1, e_2,\ldots , e_n, \mathit {out}(\mathit {id}, \tau _2)\ldots \end{equation*}\)
    iff none of the events \(e_1,e_2, \ldots , e_n\) is of the form \(\mathit {pay}(\mathit {id}, \tau _3)\) .
    Fig. 1.
    Fig. 1. Illustration of sequencing operator semantics.
    While the pattern in Equation (3) does discover shopliftings without raising obvious false alarms it is still incorrect in a more subtle way: We expect variable \(x\) to be bound to the event indicating that the stolen item was taken from the shelf by a thief or their accomplice. This information can then be used to identify the criminal by consulting security cameras pointing towards the relevant shelf. Unfortunately, since events can be skipped when pattern matching, the pattern in Equation (3) also matches the following situation:
    \(\begin{equation*} \ldots \begin{array} {|c|}\hline \hline {{\mathit {get}(\mathit {id},\mathit {sid}, \mathit {temp}, \tau _1)}} \end{array} \ldots \mathit {put}(\mathit {id},\mathit {sid_1}, \mathit {temp}, \tau _2)\ldots \mathit {get}(\mathit {id},\mathit {sid_1}, \mathit {temp}, \tau _3)\ldots \begin{array} {|c|}\hline \hline {{\mathit {out}(\mathit {id}, \tau _2)}} \end{array}\ldots \end{equation*}\)
    Above, the framed events are matched by a pattern and are bound to variables \(x\) and \(y\) . Thus, the resulting complex event reports incorrectly that the stolen item was last taken from the shelf at \(\tau _1\) , where in reality it was then moved to another shelf (perhaps by an innocent employee) and only then taken by a thief. Thus, we need to forbid skipping both \(\mathit {pay}\) and \(\mathit {put}\) -events. We apply _ \(\mathbin {\small {or}}\) _ pattern in the second argument of the _ \(\mathbin {\small {noskip}}\) _ operator. Pattern \(\phi _1\mathbin {\small {or}}\phi _2\) matches the stream fragment if it is matched by either \(\phi _1\) or \(\phi _2\) . Thus, finally, we arrive at the shoplifting discovering pattern from our initial query in Equation (1).
    Is this solution correct? In practice (probably)—yes. But there is an edge case, preventable by the typical physical arrangement of shops, which makes it possible for this pattern to raise false shoplifting alarm. Namely, the customer could pay for an item, go back to a shelf, put the item there, take it again, and leave the shop without paying (since they paid already). The author noticed this possibility only long after this example was fully written. This example shows how non-obvious some matchings of a simple pattern can be, and, therefore, how much one needs the automated tools for exploration of the possible matchings. A necessary prerequisite for the existence of such tools is formal semantics for patterns, such as the one presented in this article.
    In the next examples we introduce iteration, and also the approach which leads to tree semantics:
    Example 2.3.
    The task is to recognize dangerous increase of storage temperature in a shelf. Such a complex event is recognized if there are at least three events reporting that the temperature is steadily increasing for at least 5 and no more than 10 units of time. The associated complex event with schema \(\mathit {inc}_t(\mathit {sid}, \Delta _\tau , \Delta _t, \tau)\) shows both the reported duration of increase ( \(\Delta _\tau\) ), the change in temperature( \(\Delta _t\) ), the identifier of the shelf ( \(\mathit {sid}\) ) and the timestamp of the complex event ( \(\tau\) ). The query which facilitates this task is as follows:
    \(\begin{align} &\small{SELECT}\quad \mathit {inc}_t(y\mathbin {.}\mathit {sid}, y\mathbin {.}\tau - y\mathbin {.}\tau _{\text{min}}, y\mathbin {.}t_{\text{max}}-y\mathbin {.}t_{\text{min}}, y\mathbin {.}\tau)\nonumber \nonumber\\ &\small{FROM}\quad ((\mathit {shelf}\mathbin {\small {as}}x){+}\small {agr}(\mathit {sid}:\mathbf {head}(x\mathbin {.}\mathit {sid}), \mathit {eq}:\mathbf {eq}(x\mathbin {.}\mathit {sid}), \mathit {inc}:\mathbf {inc}(x\mathbin {.}\mathit {temp}),\nonumber \nonumber\\ &\quad \quad \quad \quad \quad \quad t_{\text{max}}:\mathbf {max}(x\mathbin {.}\mathit {temp}), t_{\text{min}}:\mathbf {min}(x\mathbin {.}\mathit {temp}), \tau _{\text{min}}:\mathbf {min}(x\mathbin {.}\tau))\mathbin {\small {as}}y \nonumber \nonumber\\ &\quad \quad \quad \quad \quad \quad \quad \mathbin {\small {filter}}(y\mathbin {.}\mathit {eq}\wedge y\mathbin {.}\mathit {inc}\wedge y\mathbin {.}\tau -y\mathbin {.}\tau _{\text{min}}\ge 5\wedge y\mathbin {.}\tau -y\mathbin {.}\tau _{\text{min}}\le 10) \end{align}\)
    (4)
    Pattern \(\phi {+}\small {agr}(\ldots)\mathbin {\small {as}}x\) matches a sequence of one or more matchings of \(\phi\) , i.e., it matches the same fragment of a stream as \(\phi \mathbin {\small {or}}(\phi \mathbin {;}\phi)\mathbin {\small {or}}(\phi \mathbin {;}\phi \mathbin {;}\phi)\mathbin {\small {or}}\cdots\) . We cannot directly refer to variables bound by \(\phi\) since \(\phi\) is matched multiple times. Instead, we aggregate variables bound by \(\phi\) with aggregate functions (see Figure 2). Aggregate functions are interpreted as functions on lists of values. E.g., \(\mathbf {head}(L)\) returns the first element of a list \(L\) , \(\mathbf {eq}(L)\) is true iff all elements of \(L\) are identical, \(\mathbf {inc}(L)\) is true iff \(L\) is increasing, \(\text{min}(L)\) and \(\text{max}(L)\) return minimum and maximum of \(L\) , respectively. A pattern \(\psi :=\phi {+}\small {agr}(a_0:f_0(E_0), a_1:f_1(E_1), \ldots , a_n:f_n(E_n))\mathbin {\small {as}}x\) binds variable \(x\) to a special event \(\small {agr}(v_0, v_1,\ldots ,v_n, v_\tau)\) with attributes \(a_0, a_1, \ldots , v_n,\tau\) , respectively. Here \(v_i\) is the result of applying aggregate function \(f_i\) to a list of expressions of the form \(E_i\) computed with respect to successful consecutive matchings with \(\phi\) , and \(\tau\) is the timestamp of the last event matched by \(\psi\) . Thus, in the query in Equation (4) \(y\) is bound to an event such that \(y\mathbin {.}\mathit {sid}\) is the sid of the first \(\mathit {shelf}\) -event matched, \(y\mathbin {.}\mathit {eq}\) is equal to true iff all the \(\mathit {shelf}\) -events matched have the same sid, and so on.
    Fig. 2.
    Fig. 2. Illustration of binding of variables in iterated patterns.
    Example 2.4.
    The following pattern demonstrates nesting of iterations (in this case, the example is deliberately abstract, and thus we do not provide any interpretation for the events used):
    \(\begin{equation*} ((R\mathbin {\small {as}}x);((R\mathbin {\small {as}}y){+}\small {agr}(c:\text{min}(y\mathbin {.}b)) \mathbin {\small {as}}z)){+}\small {agr}(m:\text{max}(x\mathbin {.}a*z\mathbin {.}c))\mathbin {\small {as}}t. \end{equation*}\)
    Interestingly, given only a set of positions in the event stream matched by this pattern, it is not always clear which events were matched by which parts of the pattern (see e.g., Figure 3). Arguably, it is often useful for the semantics of the pattern to capture not just the events matched, but also the intended role of those events. Thus, the denotational semantics of the pattern should contain matchings presented in Figure 3 where each matched part of the stream is clearly associated with the subpattern it matched with.
    Fig. 3.
    Fig. 3. Illustration of nested iterations and “annotated” pattern matching results.

    3 Preliminaries

    In this section, we introduce some basic notation to be used in the sequel.
    We denote by \(\mathcal {D}\) the set of values. We assume that this set includes at least all real and integer numbers, lists of values, events (to be introduced later), and a special value \(\mathbf {null}\) . We also assume that it is a partially ordered set, with the ordering relation denoted by “ \(\le\) ” which extends standard ordering on numbers.
    Our notation for lists is inspired by Prolog: We write \([a_0, a_1, a_2, \ldots , a_n]\) to denote a list of \(n+1\) elements \(a_0, a_1, a_2, \ldots , a_n\) . We write \([a_0, a_1, a_2, \ldots , a_n\;|\;L]\) to denote a list which starts with elements \(a_0, a_1, a_2, \ldots , a_n\) and ends with a list \(L\) . We denote list concatenation by the addition sign, e.g., \([1, 2]+[3, 4]=[1, 2, 3, 4]\) . Given a set of lists \(X\) and a list \(p\) we define \(p+X:=\lbrace p+q\;|\;q\in X\rbrace\) .
    Let \(\mathbf {V}\) be a set of variables. A variable substitution is a function \(\sigma :\mathbf {V}\rightarrow \mathcal {D}\cup \mathbf {V}\) such that \(\sigma (x)=x\) for all but the finite number of variables \(x\in \mathbf {V}\) . We denote \(\small{Dom}(\sigma):=\lbrace x\in \mathbf {V}\;|\;\sigma (x)\ne x\rbrace\) . We will assume that for any variable substitution \(\sigma\) we have \(\sigma (x)\in \mathcal {D}\) for any \(x\in \small{Dom}(\sigma)\) . We denote a substitution \(\sigma\) such that \(\small{Dom}(\sigma)=\lbrace x_0,\ldots , x_n\rbrace\) and \(\sigma (x_i)=a_i\) for \(i\in \lbrace 0, \ldots , n\rbrace\) by \(\lbrace x_0/a_0,\ldots , x_n/a_n\rbrace\) . Given a substitution \(\sigma\) and a set of variables \(X\subseteq \mathbf {V}\) we define the restriction of \(\sigma\) to \(X\) by \(\sigma |_X(x):={\left\lbrace \begin{array}{ll} \sigma (x)&\text{if}\;x\in X,\\ x&\text{if}\;x\notin X \end{array}\right.}\) . Clearly, \(\small{Dom}(\sigma |_X)=\small{Dom}(\sigma)\cap X\) . Substitutions extend to arbitrary terms with \(\sigma (f(t_0, \ldots , t_n)):=f(\sigma (t_0), \ldots , \sigma (t_n))\) , where \(t_0, \ldots , t_n\) are terms and \(f\) is a function symbol of some arity \(n+1\) . Using this we can define composition \(\sigma _1\circ \sigma _2\) of substitutions \(\sigma _1\) and \(\sigma _2\) as the ordinary function composition, i.e., \((\sigma _1\circ \sigma _2)(x):=\sigma _1(\sigma _2(x))\) . Given a term \(T\) we will denote by \(\small{Vars}(T)\) the set of variables contained in \(T\) .
    Patterns are evaluated on streams of events. Usually, events are not opaque but carry some data. We use a data model where events are essentially records. We assume that attributes of events can have any elements of \(\mathcal {D}\) as values, which may include lists and other events. First, we define an event schema:
    Definition 3.1.
    An event schema \(\mathcal {R}\) is a set of terms of the form \(f(a_1, a_2,\ldots ,a_n)\) called event type specifications, where \(f\) is an event type and \(a_1, a_2,\ldots ,a_n\) is a list of distinct attribute names of events of this type (the order matters). We assume that each event type has a timestamp attribute \(\tau\) (we represent timestamps by the real numbers). We also assume that for each event type, there is a unique event specification in \(\mathcal {R}\) .
    Given an event schema we can define events conforming to this schema:
    Definition 3.2.
    Given a event schema \(\mathcal {R}\) , an \(\mathcal {R}\) -event \(e:=f(v_1, v_2, \ldots , v_n)\) is a term such that \(v_1, v_2, \ldots , v_n\in \mathcal {D}\) are values and \(f(a_1, a_2,\ldots , a_n)\in \mathcal {R}\) for some attribute names \(a_1, a_2,\ldots , a_n\) . In this case, for all \(i\in \lbrace 1, \ldots , n\rbrace\) , we call \(v_i=:e.a_i\) the value of the attribute \(a_i\) of \(e\) . If an event \(e\) does not have an attribute \(a\) then we define \(e.a := \mathbf {null}\) . We denote the set of all possible \(\mathcal {R}\) -events by \(\small{Events}_\mathcal {R}\) .

    4 Event Streams and Stream Patterns

    Assume an event schema \(\mathcal {R}\) .
    An \(\mathcal {R}\) -event stream is a map \(\Gamma :\mathbb {N}\rightarrow \small{Events}_\mathcal {R}\) such that \(i\lt j\) implies \(\Gamma (i)\mathbin {.}\tau \le \Gamma (j)\mathbin {.}\tau\) for all \(i,j\in \mathbb {N}\) . For brevity we write \(\Gamma _i:=\Gamma (i)\) for all \(i\in \mathbb {N}\) .
    The grammar of patterns \(\phi\) in PatLang is as follows:
    \(\begin{align} \phi \quad ::=\quad & R\mathbin {\small {as}}x \quad |\quad R\quad |\quad \phi \mathbin {\small {filter}}P\quad |\quad \phi \mathbin {\small {or}}\phi \quad |\quad \phi \mathbin {\small {and}}\phi \quad |\quad \phi \mathbin {\small {noskip}}\psi \nonumber \nonumber\\ &\quad |\quad \phi \mathbin {;}\phi \quad |\quad {\phi +}\quad |\quad {\phi +}\small {agr}(a_0:f_0(E_0), a_1:f_1(E_1),\ldots)\mathbin {\small {as}}x. \end{align}\)
    (5)
    Above,
    \(R\) is either an event type, or special symbol \(?\) denoting event of any type, i.e.,
    \(\begin{equation*} R\quad ::=\quad \langle \text{event type}\rangle \quad |\quad ? \end{equation*}\)
    \(x\in \mathbf {V}\) is an event variable
    \(\psi\) is any pattern matching single events, more precisely,
    \(\begin{equation*} \psi \;::=\;R\quad |\quad R\mathbin {\small {as}}x \quad |\quad \psi \mathbin {\small {filter}}P\quad |\quad \psi \mathbin {\small {or}}\psi , \end{equation*}\)
    \(a_0, a_1, \ldots\) are attribute names, and \(f_0,f_1,\ldots\) are aggregate functions, such as \(\text{sum}()\) , \(\text{avg}()\) , and so on.
    \(E_i\) ’s are expressions constructed from numeric literals and attribute references of the form \(x\mathbin {.}a\) (where \(x\) is an event variable and \(a\) is an attribute) with arithmetic operators (addition, multiplication, etc.), thus,
    \(\begin{equation*} E\quad ::=\quad x\mathbin {.}a\quad |\quad \langle \mathit {number}\rangle \quad | \quad -E\quad |\quad E_1+E_2\quad |\quad E_1-E_2\quad |\quad E_1*E_2\quad |\quad \ldots \end{equation*}\)
    \(P\) is a predicate formula constructed from comparisons of the form \(E_1\le E_2\) , \(E_1=E_2\) , and so on (where \(E_i\) ’s are expressions) using conjunction ( \(\wedge\) ) and disjunction ( \(\vee\) ), i.e.,
    \(\begin{equation*} P\quad ::=\quad E_1=E_2\quad |\quad E_1\ne E_2\quad |\quad E_1\le E_2\quad |\quad \ldots \quad | \quad P_1\vee P_2\quad |\quad P_1\wedge P_2. \end{equation*}\)
    The informal semantics of pattern constructors is as follows:
    If \(R\) is an event type then \(R\) matches any event of this type occurring after the start of matching. Before matching, any number of events (including events of type \(R\) ) may be skipped. If \(R=?\) then \(R\) matches any event of any type. \(R\mathbin {\small {as}}x\) matches like \(R\) , but it also binds the matched event to a variable \(x\) .
    \(\phi \mathbin {\small {filter}}P\) matches complex events matched by \(\phi\) if they satisfy \(P\) (and binds whatever \(\phi\) binds),
    \(\phi _1 \mathbin {\small {or}}\phi _2\) matches complex events matched by \(\phi _1\) or by \(\phi _2\) and binds variables bound by both \(\phi _1\) and \(\phi _2\) .
    \(\phi _1 \mathbin {\small {and}}\phi _2\) matches complex events consisting of events matched by \(\phi _1\) and \(\phi _2\) (or, in another words, it matches a union of stream fragments matched by both \(\phi _1\) and \(\phi _2\) ). It binds variables bound by at least one of \(\phi _1\) or \(\phi _2\) (and the binding by \(\phi _2\) shadows the binding by \(\phi _1\) ).
    \(\phi \mathbin {\small {noskip}}\psi\) matches a stream fragment matched by \(\phi\) , but without skipping events matched by \(\psi\) (recall that \(\psi\) is allowed to match single events only). Normally, any event can be skipped during matching, so this construct permits control over skipping of events. \(\phi \mathbin {\small {noskip}}\psi\) binds whatever variables \(\phi\) binds (it does not bind variables bound by \(\psi\) , though).
    \(\phi _1\mathbin {;}\phi _2\) matches complex event matched by \(\phi _1\) followed by complex event matched by \(\phi _2\) . It binds variables bound by at least one of \(\phi _1\) or \(\phi _2\) (and the binding by \(\phi _2\) shadows the binding by \(\phi _1\) ).
    \({\phi +}\small {agr}(a_0:f_0(E_0),\ldots , a_n:f_n(E_n))\mathbin {\small {as}}x\) matches a sequence of one or more complex events matched by \(\phi\) . For each matching of \(\phi\) expressions \(E_0, \ldots , E_n\) are evaluated in the context of variables bindings provided by \(\phi\) and the results are aggregated using aggregate functions \(f_0,\ldots ,f_n\) . The final aggregated values \(v_0,\ldots , v_n\) are collected into a special event \(\small {agr}(v_0,\ldots , v_n, v_\tau)\) with specification \(\small {agr}(a_0,\ldots , a_n, \tau)\) (assumed to belong to event schema) which is then bound to a variable \(x\) ( \(x\mathbin {.}\tau\) is bound to the timestamp \(v_\tau\) of the last event matched by the iteration construct). Since \(\phi\) is matched potentially multiple times, bindings provided by those matchings are not directly available (though they can impact the value of the aggregate event).
    \({\phi +}\) matches a sequence of one or more complex events matched by \(\phi\) . It does not bind any variables.
    Remark 1.
    Concatenation (“ \(;\) ”), alternative ( \(\mathbin {\small {or}}\) ) and iteration (“ \(+\) ”) are clearly inspired here as in [18] by regular expressions. In fact, recursive definitions of automata recognizing strings of events described by patterns (see, e.g., [18], [25]) mimic standard constructions of non-deterministic finite automata recognizing regular languages. This justifies calling PatLang “regular expression-like”. There are, however, two important differences between “regular-like” analysis of streams of events and recognition of actual regular languages, which merit both important differences in semantics between analogous operators, and the presence of additional operators in event pattern languages like PatLang: First, events do not (usually) form a finite alphabet, and this forces the use of filtering operators over infinite domains. Secondly, complex event recognition, unlike recognition of words in regular languages, permits skipping of both irrelevant, but also relevant events, when processing event streams.
    Remark 2.
    We decided to use \(\mathbin {\small {noskip}}\) in our pattern language instead of a full bounded negation because it is a minimalist skip control, sufficient in most cases where some kind of negation is called for. In fact, negation is needed in case of matching patterns of events precisely because events may be skipped, and between matched events, there may occur some events we do not want to occur. Of course, adding bounded negation would have increased the expressibility of our language. However, it would have also somewhat complicated the already complex enough semantics. More importantly, it would have also made significantly more complex the implementation of our pattern exploration tool [30], which uses narrowing to efficiently generate the streams satisfying the given pattern. More precisely, because the tool only generates partially instantiated strings of events (in particular, arbitrary stretches of skipped events between matched ones are represented as single constrained variables) it is not enough just to check that some event substring does not match the negated pattern: we need to preserve that as a constraint. Thus, we decided that the pattern language as presented in this article is good enough for the present. It permits us to express many non-trivial examples, and the additional expressibility is not worth the programming and mathematical effort.
    An important part of stream query languages is time-windowing: the ability to limit the span of matched events to a given maximal time interval. This is crucial for the practical viability of any such query language because only time bounded complex events can always be either recognized completely or be proven to be a dead end after considering only a finite part of the event stream. In [18] the only notion of time built into the formalism was the position of an event in the event stream. However, windowing with respect to the number of events in the stream would be both semantically meaningless and misleading computationally: Why would irrelevant events influence the size of a window? And, while we expect a number of events in the stream within a unit of time to be bounded by throughput of the system, we do not expect it to be strictly determined. While our pattern language does not provide a separate time windowing construct, it can be easily achieved using \(\mathbin {\small {filter}}\) and the fact that all the events are timestamped. E.g., in order to match \(\mathit {get}\) -event within 10 units of time of a \(\mathit {put}\) -event for the same shelf we can use the following pattern:
    \(\begin{equation*} (\mathit {put} \mathbin {\small {as}}x)\mathbin {;}(\mathit {get}\mathbin {\small {as}}y) \mathbin {\small {filter}}x.\mathit {id}=y.\mathit {id}\wedge y.\tau -x.\tau \le 10. \end{equation*}\)
    Suppose that time constraints are treated as ordinary constraints by the pattern matching implementation, and that the appropriate \(\mathit {get}\) -event is not encountered within 10 units. In this case, the implementation would keep searching for the \(\mathit {get}\) -event with a small enough timestamp and will never find it, as timestamps are only increasing. One reason to have separate time constructs in the pattern language is that one can then treat some conditions involving time as special cases, avoiding easily the just described problem by dropping the matching of the current complex event immediately after encountering any event outside the given time window. We decided not to include any special time handling operators because of two reasons. Firstly, it turns out we can utilize the \({\mathbin {\small {noskip}}}\) operator to rewrite the above pattern so that this eager abortion of matching will be enforced even in a naive implementation:
    \(\begin{align*} &(\mathit {put} \mathbin {\small {as}}x)\mathbin {;}((\mathit {get}\mathbin {\small {as}}y \mathbin {\small {filter}}x.\mathit {id}=y.\mathit {id}\wedge y.\tau -x.\tau \le 10)\\ &\quad \quad \mathbin {\small {noskip}}(?\mathbin {\small {as}}z\mathbin {\small {filter}}z.\tau -x.\tau \gt 10)). \end{align*}\)
    Thus, we cannot skip any event later than \(x.\tau +10\) nor we can match it (even if it is of type \(\mathit {get}\) ). Hence, once we encounter any event with timestamp bigger than \(x.\tau +10\) the matching of current complex event fails. This solution can be easily adapted to more general cases. Secondly, situations where searching for more events is guaranteed not to lead to a successful matching may also happen with conditions which do not involve time. Consider, e.g., the following pattern:
    \(\begin{align*} &(\mathit {shelf}\mathbin {\small {as}}x\mathbin {\small {filter}}x.\mathit {sid}=1\wedge x.\mathit {temp}\gt 10){+} \small {agr}(c:\mathbf {count}(*), \tau _m:\mathbf {min}(x.\tau)) \mathbin {\small {as}}y\\ &\quad \quad \quad \quad \mathbin {\small {filter}}y.c\gt 3\wedge y.c\lt 10\wedge y.\tau -y.\tau _m\lt 20. \end{align*}\)
    First, note that because of syntactic constraints, we cannot apply the aforementioned technique to abort iteration as soon as it leaves the time window. In addition, a direct implementation could keep trying further iterations even after the iteration count becomes greater than 9. In general, a condition may become true after sufficiently many iterations, and only in special cases, like the one above, we can be sure that iterating more will not help. We feel that this problem should be addressed through query optimization, and not by adding special constructs to the language.
    Patterns may contain subpatterns. The subpattern structure of patterns can be presented using the subpattern parse trees. E.g., Figure 4 presents subpattern parse tree of the following pattern:
    \(\begin{align} &(\ (T\mathbin {\small {as}}y)\mathbin {;} (\nonumber \nonumber\\ &\quad \quad ((R\mathbin {\small {as}}x)\mathbin {;}((S\mathbin {\small {as}}x){+}\small {agr}(\mathit {tot}: \text{sum}(x\mathbin {.}w), \mathit {cnt}: \text{count}(*))\mathbin {\small {as}}z)\mathbin {\small {filter}}x\mathbin {.}m \ge z\mathbin {.}\mathit {tot}\nonumber \nonumber\\ &\quad \quad){+}\small {agr}(v: \text{avg}(z\mathbin {.}\mathit {cnt}-y\mathbin {.}h)) \mathbin {\small {as}}y\nonumber \nonumber\\ &\quad) \mathbin {\small {filter}}y\mathbin {.}v \ge 5\wedge y\mathbin {.}v \le 10)\mathbin {\small {or}}(R+). \end{align}\)
    (6)
    Fig. 4.
    Fig. 4. Subpattern parse tree for the query in Equation (6).
    Leaves of subpattern parse trees, which play a crucial role in the description of the binding structure of patterns and their tree semantics, are labelled by patterns with no proper subpatterns, i.e., either \(R\mathbin {\small {as}}x\) or \(R\) . Internal nodes are labelled by pattern constructors with empty slots (denoted by underscores _) for subpatterns, and their child trees correspond to the direct subpatterns.
    Having a subpattern parse tree we can address any subpattern with its position—a list of integers. The whole pattern (i.e., the root of the subpattern parse tree) has position \([]\) (the empty list), and for each subpattern with position \(p\) , position of its \(i\) th direct subpattern is given by \(p+[i]\) . Consider, e.g., the pattern (6) and its subpattern parse tree in Figure 4. The subpattern at position \([1, 1, 1]\) is \(T\mathbin {\small {as}}y\) and subpattern at position \([2, 1]\) is \(R\) . We denote by \(\mathbb {P}(\phi)\) the set of positions of subpatterns of a pattern \(\phi\) . Given \(p\in \mathbb {P}(\phi)\) we denote by \(\phi _p\) the subpattern of \(\phi\) at position \(p\) .
    There are two kinds of pattern constructors that bind variables: _ \(\mathbin {\small {as}}\) _ and (_) \({+}\small {agr}(\ldots)\mathbin {\small {as}}\) _. Each of them introduces a single variable. Thus, a position corresponding to one of those constructors uniquely identifies a variable. Given a pattern \(\phi\) we denote by \(\mathbb {P}_v(\phi)\) the set of positions of subpatterns of \(\phi\) constructed with either _ \(\mathbin {\small {as}}\) _ or \((\_){+}\small {agr}(\ldots)\mathbin {\small {as}}\_\) . Given \(p\in \mathbb {P}_v(\phi)\) we define \(v_\phi (p)\) to be the unique variable bound by \(\phi _p\) : Thus, if \(\phi\) is a pattern in Equation (6) then \(v_\phi ([1,1, 1])=y\) and \(v_\phi ([1,1,2,1,1,2])=z\) .
    Events bound to variables are referenced either in conditions inside the _ \(\mathbin {\small {filter}}\) _ construct or in aggregate expressions. We denote the set of positions \(p\in \mathbb {P}(\phi)\) such that \(\phi _p\) is of the form \(\phi ^{\prime }\mathbin {\small {filter}}P\) or of the form \(\phi ^{\prime }{+}\small {agr}(\ldots)\mathbin {\small {as}}x\) by \(\mathbb {P}_r(\phi)\) .
    Let us now explain with an example the concept of variable binding in the context of our patterns.
    Example 4.2.
    In pattern \(R\mathbin {\small {as}}x \mathbin {\small {filter}}x\mathbin {.}a=y\mathbin {.}a\) variable \(x\) occurs in two places: the subpattern \(R\mathbin {\small {as}}x\) and in the condition \(x\mathbin {.}a=y\mathbin {.}a\) . \(R\mathbin {\small {as}}x\) binds \(x\) to some event \(E\) of type \(R\) during the execution of matching of this pattern with some event stream, and the second occurrence is in the scope of this binding. The latter means that when checking this condition the execution engine will consider \(x\) to be equal to \(E\) . On the other hand, the sole occurrence of variable \(y\) (in the condition \(x\mathbin {.}a=y\mathbin {.}a\) ) is free: this means that matching alone cannot provide the value of this variable. Thus, before executing matching with this pattern we have to provide externally some event as a value of \(y\) . Incidentally, since \(x\) occurs only bound in the conditions of the above pattern, even if we provided externally the value of \(x\) it would be ignored.
    We now have to describe the binding structure of patterns and the scope of bindings. The general rule is that a variable must be bound before it is used. Also, later bindings can shadow earlier ones (which is why in the pattern of the form \(\phi \mathbin {;}\psi\) bindings provided by \(\phi\) are visible in \(\psi\) but not vice-versa). There are, however, many subtleties and exceptions involved here and the binding structure and scope rules of our patterns are very distinct from what one encounters in, say, first-order logic or \(\lambda\) -calculus. Therefore, prior to presenting formal definitions, we first illustrate our ideas informally with a couple of examples which also provide some intuitive justification for rules given later:
    Fig. 5.
    Fig. 5. Subpattern parse tree for the query in Example 4.3.
    Example 4.3.
    Consider the following pattern (its parse tree is presented in Figure 5):
    We will explain some subtleties concerning the scope of variable binding and shadowing using this pattern as an example. Above, we bind variables \(x\) and \(y\) in three places ( \(R\mathbin {\small {as}}x\) , \(S\mathbin {\small {as}}y\) and \(T\mathbin {\small {as}}x\) ), and we refer to variables \(x\) and \(y\) in three conditions (coloured red). Each of those conditions is placed in distinct scopes of binding constructs. Let us start with the first condition. In the formula below, we draw arrows from variables in binding constructs to occurrences of those variables in the first condition bound by those constructs:
    No surprises here: in the pattern of the form \(\phi \mathbin {\small {filter}}P\) condition \(P\) is in the scope of any binding provided by \(\phi\) , and in the pattern of the form \(\phi _1\mathbin {;}\phi _2\) all the bindings provided by \(\phi _1\) are visible in conditions in \(\phi _2\) (unless they are shadowed in \(\phi _2\) ). Consider now the second condition:
    Here \(x\) used in the condition is bound by the \(T\mathbin {\small {as}}x\) construct filtered by this condition. However, the occurrence of \(y\) is free: it is not in the scope of \(S\mathbin {\small {as}}y\) in the other branch of the _ \(\mathbin {\small {or}}\) _ pattern. And this is logical: when executing the pattern of the form \(\phi _1\mathbin {\small {or}}\phi _2\) we match only one of the branches: either \(\phi _1\) or \(\phi _2\) . Hence, it would be absurd to refer in \(\phi _1\) to the event matched in \(\phi _2\) (or vice-versa). Thus, when executing the above pattern we need to provide externally value of \(y\) . Finally, let us consider the last condition:
    Above, variable \(x\) in the condition is in the scope of the whole _ \(\mathbin {\small {or}}\) _ subpattern. In this subpattern \(x\) is bound in both branches. Since we do not know which of them will be actually matched, it makes sense to consider \(x\) to be bound (syntactically) by both binders ( \(R\mathbin {\small {as}}x\) and \(T\mathbin {\small {as}}x\) ). This does not lead to any ambiguity: during the actual matching, only one branch will match and provide unique binding to \(x\) . On the other hand, \(y\) is bound in only one (left) branch of the _ \(\mathbin {\small {or}}\) _ subpattern. We do not know before actually executing the matching if the left branch will match. Therefore, we do not consider the occurrence of \(y\) in the last condition to be in the scope of the _ \(\mathbin {\small {or}}\) _ subpattern. Thus, this occurrence is again free.
    Example 4.4.
    Consider the following pattern (its parse tree is presented in Figure 6) together with arrows from binding constructs to variables in conditions in the scope of those constructs:
    This is an example of shadowing: Variable \(x\) in the first condition is in the scope of \(R\mathbin {\small {as}}x\) . The subpattern \(R\mathbin {\small {as}}x\mathbin {\small {filter}}x\mathbin {.}a=2\) is composed using sequencing operator _ \(\mathbin {;}\) _ with \(S\mathbin {\small {as}}x\) . Since the latter matches events further in the input stream than the first subpattern, the binding of \(x\) provided by \(S\mathbin {\small {as}}x\) shadows the binding provided by \(R\mathbin {\small {as}}x\mathbin {\small {filter}}x\mathbin {.}a=2\) (most recent binding wins). Hence \(x\) in the second condition is bound by \(S\mathbin {\small {as}}x\) and not by \(R\mathbin {\small {as}}x.\)
    Fig. 6.
    Fig. 6. Subpattern parse tree for the query in Example 4.3.
    Example 4.5.
    Consider the following pattern together with arrows from binding constructs to variables in conditions in the scope of those constructs:
    This example demonstrates that in the patterns of the form \(\phi {+}\small {agr}(a_1:E_1,\ldots)\mathbin {\small {as}}x\) variables bound inside \(\phi\) can be used inside \(\phi\) (if they are in scope), and also in aggregate expressions \(E_i\) , but they are inaccessible outside the iteration constructor (e.g., red painted variable \(y\) above is free, even though \(y\) is bound inside the iteration step. This is logical since a pattern of the form \(\phi {+}\) or \(\phi {+}\small {agr}(\ldots)\mathbin {\small {as}}x\) matches iff \(\phi\) matches one or more times. Thus a variable, say \(y\) , bound in \(\phi\) would be bound one or more times. In the latter case to which of those bindings would we want to refer? We can, however, build a summary of those bindings in the aggregate expressions (as we do in our example computing the sum of attributes \(a\) of all \(y\) ’s bound inside the iteration operator).
    Now we are ready to define formally the binding structure of patterns. More precisely, given a pattern \(\phi\) , a variable \(x\) and a position \(p\in \mathbb {P}_r(\phi)\) where \(x\) is referenced (i.e., either in a condition or in an aggregate expression) we want to have a set of positions (if any) in \(\mathbb {P}_v(\phi)\) which bind \(x\) . We call a set of positions binding a single variable a binder for this variable. Recall that it needs to be a set of positions and not just a single position, because of how we interpret binding by patterns of the form \(\phi _1\mathbin {\small {or}}\phi _2\) (see Example 4.3). Given a pattern \(\phi\) we also need to know which variables are bound by \(\phi\) (or more specifically the set of binders for those variables), so that they can be used to construct higher-order events, e.g., in SELECT queries mentioned earlier. We provide this information by defining for all patterns \(\phi\)
    a set \(\mathfrak {B}(\phi)\) of binders for variables bound by \(\phi\) , and
    for all \(p\in \mathbb {P}_r(\phi)\) a set \(\mathfrak {B}(\phi , p)\) of binders in \(\phi\) of variables which are in scope at position \(p\) (note that this can be used to verify the correctness of attribute references in expressions).
    Since we use \(\mathfrak {B}(\phi)\) in the definition of \(\mathfrak {B}(\phi , p)\) we start by defining \(\mathfrak {B}(\phi)\) using recursion on the structure of \(\phi\) (c.f. [18, page 10]):
    \(\begin{gather} \mathfrak {B}(R)=\mathfrak {B}(\phi +):=\emptyset ,\quad \mathfrak {B}(R\mathbin {\small {as}}x)=\mathfrak {B}((\phi +)\small {agr}(\ldots)\mathbin {\small {as}}x):=\lbrace \lbrace []\rbrace \rbrace ,\nonumber \nonumber\\ \mathfrak {B}(\phi \mathbin {\small {filter}}P)=\mathfrak {B}(\phi \mathbin {\small {noskip}}\psi) :=\lbrace [1]+V\;|\;V\in \mathfrak {B}(\phi)\rbrace ,\nonumber \nonumber\\ \hspace{-142.26378pt} \mathfrak {B}(\phi _1\mathbin {\small {and}}\phi _2)=\mathfrak {B}(\phi _1\mathbin {;}\phi _2) :=\lbrace [2]+V\;|\;V\in \mathfrak {B}(\phi _2)\rbrace \nonumber \nonumber\\ \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \cup \lbrace [1]+V\;|\;V\in \mathfrak {B}(\phi _1)\wedge v_{\phi _1}(V)\notin v_{\phi _2}(\mathfrak {B}(\phi _2))\rbrace \rbrace ,\nonumber \nonumber\\ \mathfrak {B}(\phi _1\mathbin {\small {or}}\phi _2):=\lbrace ([1]+ V_1)\cup ([2]+V_2)\;|\;V_1\in \mathfrak {B}(\phi _1) \wedge V_2\in \mathfrak {B}(\phi _2)\wedge v_{\phi _1}(V_1)=v_{\phi _2}(V_2)\rbrace . \end{gather}\)
    (7)
    Above, we abused the notation by writing \(v_{\phi }(\mathfrak {B}(\phi))\) to denote \(\lbrace v_\phi (V)\;|\;V\in \mathfrak {B}(\phi)\rbrace\) , where \(v_\phi (V):=\lbrace v_\phi (p)\;|\;p\in V\rbrace\) . A few comments are in order about function \(\mathfrak {B}()\) . We already described the binding structure of \(\phi _1\mathbin {\small {or}}\phi _2\) and \(\phi _1\mathbin {;}\phi _2\) . In case of \(\phi _1\mathbin {\small {and}}\phi _2\) , we know that both \(\phi _1\) and \(\phi _2\) have to match. If they both bind the same variable we have to choose which of the bindings is to be shadowed, and which is to be visible. In case of sequential composition ( \(\phi _1\mathbin {;}\phi _2\) ) we just choose the later binding to be visible. However, in case of parallel composition \(\phi _1\mathbin {\small {and}}\phi _2\) , \(\phi _1\) and \(\phi _2\) match in some unspecified order, possibly with interleaving, so, prior to the execution of the matching, we cannot know the ordering of bindings. Instead, we just arbitrarily give preference to the binding by the second subpattern, and shadow the bindings in the first. Next, since \(\phi \mathbin {\small {noskip}}\psi\) match when \(\psi\) does not match the skipped events, it follows that \(\psi\) cannot provide bindings.
    Example 4.6.
    Consider pattern \(\phi\) from Example 4.3. Denote for brevity
    \(\begin{equation*} \phi _0:=(S\mathbin {\small {as}}y)\mathbin {\small {filter}}x\mathbin {.}a=y\mathbin {.}a,\quad \phi _1:=(R\mathbin {\small {as}}x)\mathbin {;}\phi _0,\quad \phi _2:=(T\mathbin {\small {as}}x) \mathbin {\small {filter}}x\mathbin {.}a=y\mathbin {.}b. \end{equation*}\)
    Thus, \(\phi =(\phi _1\mathbin {\small {or}}\phi _2)\mathbin {\small {filter}}x\mathbin {.}a=y\mathbin {.}c\) . Then we have
    \(\begin{align*} \mathfrak {B}(\phi _0)&=\lbrace [1]+V\;|\;V\in \mathfrak {B}(S\mathbin {\small {as}}y)\rbrace =\lbrace [1]+V\;|\;V\in \lbrace \lbrace []\rbrace \rbrace \rbrace =\lbrace \lbrace [1]\rbrace \rbrace ,\\ \mathfrak {B}(\phi _1)&=\lbrace [2]+V\;|\;V\in \mathfrak {B}(\phi _0)\rbrace \cup \lbrace [1]+V\;|\;V\in \mathfrak {B}(R\mathbin {\small {as}}x)\wedge v_{R\mathbin {\small {as}}x}(V)\notin v_{\phi _0}(\mathfrak {B}(\phi _0))\rbrace \\ &=\lbrace [2]+V\;|\;V\in \lbrace \lbrace [1]\rbrace \rbrace \rbrace \cup \lbrace [1]+V\;|\;V\in \lbrace \lbrace []\rbrace \rbrace \wedge v_{R\mathbin {\small {as}}x}(V)\notin v_{\phi _0}(\lbrace \lbrace [1]\rbrace \rbrace)\rbrace \\ &=\lbrace \lbrace [2,1]\rbrace ,\lbrace [1]\rbrace \rbrace ,\\ \mathfrak {B}(\phi _2)&=\lbrace [1]+V\;|\;V\in \mathfrak {B}(T\mathbin {\small {as}}x)\rbrace =\lbrace \lbrace [1]\rbrace \rbrace ,\\ \mathfrak {B}(\phi _1\mathbin {\small {or}}\phi _2)&=\lbrace ([1]+ V_1)\cup ([2]+V_2)\;|\;V_1\in \mathfrak {B}(\phi _1) \wedge V_2\in \mathfrak {B}(\phi _2)\wedge v_{\phi _1}(V_1)=v_{\phi _2}(V_2)\rbrace \\ & =\lbrace ([1]+ V_1)\cup ([2]+V_2)\\ &\quad \quad \quad \quad \;|\;V_1\in \lbrace \lbrace [2,1]\rbrace ,\lbrace [1]\rbrace \rbrace \wedge V_2\in \lbrace \lbrace [1]\rbrace \rbrace \wedge v_{\phi _1}(V_1)=v_{\phi _2}(V_2)\rbrace \\ &=\lbrace \lbrace [1,1],[2,1]\rbrace \rbrace ,\\ \mathfrak {B}(\phi)&=\lbrace [1]+V\;|\;V\in \mathfrak {B}(\phi _1\mathbin {\small {or}}\phi _2)\rbrace =\lbrace \lbrace [1,1,1], [1,2,1]\rbrace \rbrace . \end{align*}\)
    Example 4.7.
    Consider the following patterns:
    \(\begin{gather*} \phi _1:= (R\mathbin {\small {as}}x)\mathbin {;}(S\mathbin {\small {as}}y),\quad \phi _2:=\phi _1\mathbin {;}(T\mathbin {\small {as}}x),\quad \phi _3:=\phi _2\mathbin {\small {or}}(H\mathbin {\small {as}}y). \end{gather*}\)
    From the definition \(\mathfrak {B}(\phi _1):=\lbrace \lbrace [1]\rbrace ,\lbrace [2]\rbrace \rbrace\) . Since \(v_{\phi _1}(\lbrace [1]\rbrace)=\lbrace x\rbrace\) and \(v_{\phi _1}(\lbrace [2]\rbrace)=\lbrace y\rbrace\) , we have
    \(\begin{align*} \mathfrak {B}(\phi _2)&:= \lbrace [2]+V\;|\;V\in \mathfrak {B}(T\mathbin {\small {as}}x)\rbrace \cup \lbrace [1]+V\;|\;V\in \mathfrak {B}(\phi _1)\wedge v_{\phi _1}(V)\notin v_{T\mathbin {\small {as}}x}(\mathfrak {B}(T\mathbin {\small {as}}x))\rbrace \\ &=\lbrace \lbrace [2]\rbrace \rbrace \cup \lbrace [1]+V\;|\;V\in \lbrace \lbrace [1]\rbrace ,\lbrace [2]\rbrace \rbrace \wedge v_{\phi _1}(V)\notin \lbrace \lbrace x\rbrace \rbrace \rbrace =\lbrace \lbrace [2]\rbrace , \lbrace [1, 2]\rbrace \rbrace ,\\ \mathfrak {B}(\phi _3)&:=\lbrace ([1]+ V_1)\cup ([2]+V_2)\;|\;V_1\in \mathfrak {B}(\phi _2) \wedge V_2\in \mathfrak {B}(H\mathbin {\small {as}}y)\wedge v_{\phi _2}(V_1)=v_{H\mathbin {\small {as}}y}(V_2)\rbrace \\ &=\lbrace ([1]+ V_1)\cup ([2]+V_2)\;|\;V_1\in \lbrace \lbrace [2]\rbrace , \lbrace [1, 2]\rbrace \rbrace \wedge V_2\in \lbrace \lbrace []\rbrace \rbrace \wedge v_{\phi _2}(V_1)=\lbrace y\rbrace \rbrace \\ &=\lbrace \lbrace [1,2], [2]\rbrace \rbrace . \end{align*}\)
    We can now define by a structural recursion a set \(\mathfrak {B}(\phi , p)\) of binders in \(\phi\) of variables which are in scope at position \(p\in \mathbb {P}_r(\phi)\) :
    \(\begin{gather} \mathfrak {B}(\phi \mathbin {\small {filter}}P, [])=\mathfrak {B}({\phi +}\small {agr}(\ldots)\mathbin {\small {as}}x, []):=\lbrace [1]+V\;|\;V\in \mathfrak {B}(\phi)\rbrace , \nonumber \nonumber\\ \mathfrak {B}(\phi \mathbin {\small {filter}}P, [1|p])=\mathfrak {B}({\phi +}\small {agr}(\ldots)\mathbin {\small {as}}x, [1|p]) =\mathfrak {B}(\phi {+}, [1|p]):=\lbrace [1]+V\;|\;V\in \mathfrak {B}(\phi , p)\rbrace ,\nonumber \nonumber\\ \mathfrak {B}(\phi _1 \mathbin {\small {or}}\phi _2, [i|p])=\mathfrak {B}(\phi _1\mathbin {\small {and}}\phi _2, [i|p]) :=\lbrace [i]+V\;|\;V\in \mathfrak {B}(\phi _i, p)\rbrace ,\quad i\in \lbrace 1,2\rbrace ,\nonumber \nonumber\\ \mathfrak {B}(\phi _1\mathbin {;}\phi _2, [1|p])=\mathfrak {B}(\phi _1\mathbin {\small {noskip}}\phi _2, [1|p]):= \lbrace [1]+V\;|\;V\in \mathfrak {B}(\phi _1, p)\rbrace ,\nonumber \nonumber\\ \mathfrak {B}(\phi _1\mathbin {;}\phi _2, [2|p]) := \lbrace [2]+V\;|\;V\in \mathfrak {B}(\phi _2, p)\rbrace \cup \lbrace [1]+V\;|\;V\in \mathfrak {B}(\phi _1)\wedge v_{\phi _1}(V)\notin v_{\phi _2}(\mathfrak {B}(\phi _2, p))\rbrace ,\nonumber \nonumber\\ \mathfrak {B}(\phi _1\mathbin {\small {noskip}}\phi _2, [2|p]):= \lbrace [2]+V\;|\;V\in \mathfrak {B}(\phi _2, p)\rbrace . \end{gather}\)
    (8)
    In the patterns of the form \(\phi \mathbin {\small {noskip}}\psi\) variables bound in \(\phi\) are not visible in \(\psi\) . This is justified by the fact that events not matching \(\psi\) might be skipped while matching \(\phi\) before the given variable is bound in \(\phi\) . Alternatively1 one could match \(\phi\) skipping everything, and only after \(\phi\) is fully matched reject the whole matching if some skipped events match \(\psi\) . In this case, it would make sense for \(\psi\) to be in the scope of variables matched by \(\phi\) . However, we rejected this idea as we want \(\mathbin {\small {noskip}}\) to serve as a flexible replacement for strict and skip-till-next-match selection strategy, and hence we need to control skipping events as soon as they appear.
    Example 4.8.
    Consider the pattern from Example 4.4. Denote for brevity
    \(\begin{equation*} \psi _0:=R\mathbin {\small {as}}x\mathbin {\small {filter}}x\mathbin {.}a=2,\quad \psi _1:=\psi _0\mathbin {;}(S\mathbin {\small {as}}x),\quad \phi :=\psi _1\mathbin {\small {filter}}x\mathbin {.}b=1. \end{equation*}\)
    Then \(\mathfrak {B}(\psi _0)=\lbrace \lbrace [1]\rbrace \rbrace\) , \(\mathfrak {B}(\psi _1)=\lbrace \lbrace [2]\rbrace \rbrace\) , \(\mathfrak {B}(\phi , []):=\lbrace [1]+V\;|\;V\in \mathfrak {B}(\psi _1)\rbrace =\lbrace \lbrace [1,2]\rbrace \rbrace\) .
    Example 4.9.
    Consider the following patterns:
    \(\begin{equation} \phi _1:= R\mathbin {\small {as}}x\mathbin {\small {filter}}x\mathbin {.}a+y\mathbin {.}b=z\mathbin {.}a, \quad \phi _2:=\phi _1;(S\mathbin {\small {as}}z),\quad \phi _3:=(T\mathbin {\small {as}}y);\phi _2. \end{equation}\)
    (9)
    Then \(\mathbb {P}_r(\phi _1)=\lbrace []\rbrace\) , \(\mathbb {P}_r(\phi _2)=\lbrace [1]\rbrace\) , and \(\mathbb {P}_r(\phi _3)=\lbrace [2,1]\rbrace\) . It is immediate that \(\mathfrak {B}(\phi _1, [])=\lbrace \lbrace [1]\rbrace \rbrace\) . Hence \(\mathfrak {B}(\phi _2, [1])=\lbrace [1,1]\rbrace\) , and so \(\mathfrak {B}(\phi _3, [2,1])=\lbrace \lbrace [1]\rbrace , \lbrace [2,1,1]\rbrace \rbrace\) .
    It remains to define the set of free variables of a pattern \(\phi\) . First, we define the set \(\small{free}(\phi , p)\) of free variables at position \(p\in \mathbb {P}_r(\phi)\) , which are just variables in filter conditions and aggregate expressions at \(p\) which are not bound by \(\phi\) . Thus, defining for brevity \(\hat{v}_\phi (V):=x\) whenever \(v_\phi (V)=\lbrace x\rbrace\) , we have
    \(\begin{gather*} \small{free}(\phi , p):={\left\lbrace \begin{array}{ll}\small{Vars}(P)\setminus \hat{v}_\phi (\mathfrak {B}(\phi , p)) & \text{if}\;\phi _p=\psi \mathbin {\small {filter}}P,\\ \bigcup _{i=0}^n(\small{Vars}(E_i)\setminus \hat{v}_\phi (\mathfrak {B}(\phi , p)))&\text{if}\; \phi _p=\psi {+}\small {agr}(f_0(E_0),\ldots ,f_n(E_n))\mathbin {\small {as}}x,\end{array}\right.}. \end{gather*}\)
    Above,we abuse the notation by applying \(\hat{v}_\phi\) to sets of binders: For any set of binders \(B\) , we have \(\hat{v}_\phi (B):=\lbrace \hat{v}_\phi (V)\;|\;V\in B\rbrace\) . Then we define the set of free variables of a pattern \(\phi\) by \(\small{free}(\phi):=\textstyle \bigcup _{p\in \mathbb {P}_r(\phi)}\small{free}(\phi , p).\)
    Example 4.10.
    Let \(\phi _3\) be defined as in Equation (9). Then \(\small{Vars}(x\mathbin {.}a+y\mathbin {.}b=z\mathbin {.}a)=\lbrace x,y,z\rbrace\) and \(\hat{v}_{\phi _3}(\mathfrak {B}(\phi _3, [2,1]))=\lbrace x,y\rbrace\) , hence \(\small{free}(\phi _3)=\small{free}(\phi _3, [2,1])=\lbrace z\rbrace\) .
    One can prove that \(\small{free}(\phi)\) can be equivalently defined by recursion on the structure of \(\phi\) as follows:
    \(\begin{gather} \small{free}(R)=\small{free}(R\mathbin {\small {as}}x):=\emptyset ,\quad \small{free}(\phi \mathbin {\small {filter}}P):=\small{free}(\phi)\cup (\small{Vars}(P)\setminus \mathfrak {B}_v(\phi)),\nonumber \nonumber\\ \small{free}(\phi +):=\small{free}(\phi),\quad \small{free}(\phi _1\mathbin {\small {or}}\phi _2)=\small{free}(\phi _1\mathbin {\small {and}}\phi _2):=\small{free}(\phi _1)\cup \small{free}(\phi _2),\nonumber \nonumber\\ \small{free}(\phi \mathbin {;}\psi):=\small{free}(\phi)\cup (\small{free}(\psi)\setminus \mathfrak {B}_v(\psi)), \quad \small{free}(\phi \mathbin {\small {noskip}}\psi):=\small{free}(\phi)\cup \small{free}(\psi), \nonumber \nonumber\\ \small{free}({\phi +}\small {agr}(a_0:f_0(E_0),a_1:f_1(E_1),\ldots)\mathbin {\small {as}}x) :=\small{free}(\phi)\cup \textstyle \bigcup _i(\small{Vars}(E_i)\setminus \mathfrak {B}_v(\phi)). \end{gather}\)
    (10)
    where we denoted for brevity \(\mathfrak {B}_v(\phi):=\hat{v}_\phi (\mathfrak {B}(\phi))\) (the set of variables bound by \(\phi\) ).

    5 Minimal Semantics

    In this section, we describe the first of two denotational semantics of PatLang: the minimal semantics. As we argue in the next section, it does not provide sufficient information for convenient, tool assisted verification of patterns with respect to their intent. It is, however, a minimalistic semantics which is sufficient to verify implementations of PatLang and their executable semantics.

    5.1 Semantics of Expressions and Conditions

    Predicate formulas and expressions occurring in patterns are syntactic objects, and hence they are treated as uninterpreted terms. Denotational semantics assigns values to expressions and logical validity to conditions (after they are grounded with appropriate substitution). More precisely, let \(E\) be an expression and let \(\sigma\) be a substitution such that \(\small{Vars}(E)\subseteq \small{Dom}(\sigma)\) . Then \(⟦ E⟧ _\sigma \in \mathcal {D}\) is a value of expression \(E\) . \(⟦ E⟧ _\sigma\) is defined by recursion on the structure of \(E\) in an obvious way, e.g., if \(v\in \mathcal {D}\) then \(⟦ v⟧ _\sigma :=v\) , if \(x\in \small{Vars}\) and \(a\) is an attribute then \(⟦ x\mathbin {.}a⟧ _\sigma =\sigma (x)\mathbin {.}a\) , \(⟦ E_1+E_2⟧ _\sigma :=⟦ E_1⟧ _\sigma +⟦ E_2⟧ _\sigma\) , and so on. Similarly, let \(P\) be a condition, and \(\sigma\) be a substitution such that \(\small{Vars}(P)\subseteq \small{Dom}(\sigma)\) . Then \(⟦ P⟧ _\sigma\) is a boolean value. \(⟦ P⟧ _\sigma\) is defined by recursion on the structure of \(E\) in an obvious way, e.g., \(⟦ E_1=E_2⟧ _\sigma\) iff \(⟦ E_1⟧ _\sigma =⟦ E_2⟧ _\sigma\) , and so on. Aggregate functions are interpreted as functions from lists of values to values. E.g., \(⟦ \text{sum}⟧ ([v_0,\ldots ,v_n]):=v_0+\cdots +v_n\) .

    5.2 Semantics of Patterns

    Given
    a pattern \(\phi\) in PatLang,
    an ordered stream \(\Gamma :\mathbb {N}\rightarrow \small{Events}_\mathcal {U}\) ,
    a natural number \(m\in \mathbb {N}\) representing a position in \(\Gamma\) from which we start matching with \(\phi\) ,
    a substitution \(\mu\) such that \(\small{free}(\phi)\subseteq \small{Dom}(\mu)\) which assigns events to free variables of \(\phi\) ,
    a minimal semantics associates with \(\phi\) a set \(⟦ \phi ⟧ ^{\small{min}, m}_{\Gamma , \mu }\) of pairs. Each pair \((n, \sigma)\) corresponds to a successful matching of \(\phi\) with \(\Gamma\) starting with position \(m\) and consists of,
    a position \(n\ge m\) in the stream \(\Gamma\) of a last event matched (i.e., this is the last position in \(\Gamma\) which needs to be examined before reporting successful matching with \(\phi\) ),
    a substitution \(\sigma\) with \(\small{Dom}(\sigma)=\mathfrak {B}_v(\phi)\) which represents values of unshadowed variables in \(\phi\) bound during this particular matching.
    Before we define the semantics formally, let us consider an example to illustrate our choices:
    Example 5.1.
    Let \(\phi _1:= (\text{get}\mathbin {\small {as}}x)\mathbin {;}(\psi _1\mathbin {\small {noskip}}\psi _1)\) where \(\psi _1:= \text{put}\mathbin {\small {as}}y\mathbin {\small {filter}}x\mathbin {.}\mathit {id}=y\mathbin {.}\mathit {id}\) . Consider the following fragment of a stream \(\Gamma\) (we use event schema from Section 2):
    Clearly, \((7, \sigma _1)\in ⟦ \phi _1⟧ ^{\small{min}, 2}_{\Gamma , \emptyset }\) , where \(\sigma _1:=\lbrace x/\text{get}(7, 1, 10, 5), y/\text{put}(7,5,10,19)\rbrace\) . Leaving substitutions aside for a moment, minimal semantics effectively represents matchings of patterns against an event stream \(\Gamma\) as continuous, finite intervals of positions in \(\Gamma\) . In this case, the considered matching consists of positions from 2 (start of the matching), until and including 7 (position of the last matched event). Is this reasonable? Obviously, we want to include positions of events actually matched, i.e., positions 3 and 7 (like in the semantics in [18]). Furthermore, because of the presence of skip conditions, the events in between are not irrelevant because they cannot contain proscribed events. Finally, while in the above pattern, the events before the first matched event (i.e., the event at position 3) are not constrained, in general, we can also put conditions on events preceding the first matching, like in the following pattern:
    \(\begin{equation*} \phi _2:= (\text{get}\mathbin {\small {as}}x\mathbin {\small {noskip}}\text{get}\mathbin {\small {as}}y)\mathbin {;}(\psi _1\mathbin {\small {noskip}}\psi _1). \end{equation*}\)
    We want to define \(⟦ \phi ⟧ ^{\small{min}, m}_{\Gamma , \mu }\) by structural recursion. There is an obstacle with subpatterns of the form \(\phi \mathbin {\small {noskip}}\psi\) because when computing recursively \(⟦ \phi ⟧ ^{\small{min}, m}_{\Gamma , \mu }\) we need to pass information about positions matched by \(\psi\) so that they are not skipped when matching \(\phi\) . However, then there is another problem. Consider a pattern \(\phi :=(\phi _1\mathbin {\small {and}}\phi _2)\mathbin {\small {noskip}}\psi\) and an event matched by \(\psi\) at position \(i\) in the matched stream \(\Gamma\) . In the recursive definition, we pass \(i\) to the semantics of both \(\phi _1\) and \(\phi _2\) as an unskippable position. However, we have to allow skipping \(\Gamma _i\) in either \(\phi _1\) or \(\phi _2\) , as long as it is not skipped in both, and there is no simple way of doing that. This forces us to introduce an auxiliary semantics \(⟦ \phi ⟧ ^{\small{G}, m}_{\Gamma , \mu }\) which is essentially a reformulation of semantics from [18]: \(⟦ \phi ⟧ ^{\small{G}, m}_{\Gamma , \mu }\) is a set of pairs \((X,\sigma)\) where \(X\) is a set of positions of events in \(\Gamma\) which were matched by \(\phi\) and \(\sigma\) is a substitution. Then we define
    \(\begin{equation} ⟦ \phi ⟧ ^{\small{min}, m}_{\Gamma , \mu } :=\bigl \lbrace (\max (X),\sigma)\;|\;(X,\sigma)\in ⟦ \phi ⟧ ^{\small{G}, m}_{\Gamma , \mu }\bigr \rbrace . \end{equation}\)
    (11)
    The advantage is that now we can formally relate both the minimal semantics and, later, tree semantics to the semantics from [18]. Let \(x\) be any event. Then we define \(\small{Type}(x,R)\) to be true iff either \(x\) is of type \(R\) or \(R=?\) (i.e., \(\small{Type}(x,?)\) holds for any event). Given any integers \(m,n\in \mathbb {Z}\) we denote \([m,n):=\lbrace k\in \mathbb {Z}\;|\;m\le k\lt n\rbrace\) . Finally, given any pair \(z=(x, y),\) we denote projection on the first component by \(\pi _1(z):=x\) . Then sets \(⟦ \phi ⟧ ^{\small{G}, m}_{\Gamma , \mu }\) are defined by recursion on the structure of \(\phi\) as follows:
    \(\begin{align} ⟦ R⟧ ^{\small{G}, m}_{\Gamma , \mu } &:=\lbrace (\lbrace n\rbrace , \emptyset)\;|\;n\in \mathbb {N}\wedge n\ge m\wedge \small{Type}(\Gamma _n,R)\rbrace , \end{align}\)
    (12a)
    \(\begin{align} ⟦ R\mathbin {\small {as}}x⟧ ^{\small{G}, m}_{\Gamma , \mu } &:=\lbrace (\lbrace n\rbrace , \lbrace x/\Gamma _n\rbrace)\;|\;n\in \mathbb {N}\wedge n\ge m\wedge \small{Type}(\Gamma _n,R)\rbrace , \end{align}\)
    (12b)
    \(\begin{align} ⟦ \phi \mathbin {\small {filter}}P⟧ ^{\small{G}, m}_{\Gamma , \mu } &:=\lbrace (X,\sigma)\in ⟦ \phi ⟧ ^{\small{G}, m}_{\Gamma , \mu } \;|\; ⟦ P⟧ _{\mu \circ \sigma }\rbrace , \end{align}\)
    (12c)
    \(\begin{align} ⟦ \phi _1\mathbin {\small {or}}\phi _2⟧ ^{\small{G}, m}_{\Gamma , \mu } &:=\lbrace (X, \sigma |_{\mathfrak {B}_v(\phi _1)\cap \mathfrak {B}_v(\phi _2)})\;|\; (X,\sigma)\in ⟦ \phi _1⟧ ^{\small{G}, m}_{\Gamma , \mu } \cup ⟦ \phi _2⟧ ^{\small{G}, m}_{\Gamma , \mu }\rbrace , \end{align}\)
    (12d)
    \(\begin{align} ⟦ \phi _1\!\mathbin {\small {and}}\!\phi _2⟧ ^{\small{G}, m}_{\Gamma , \mu } &:=\lbrace X_1\cup X_2, \sigma _1\circ \sigma _2)\;|\;(X_1, \sigma _1)\in ⟦ \phi _1⟧ ^{\small{G}, m}_{\Gamma , \mu }\wedge (X_2, \sigma _2)\in ⟦ \phi _2⟧ ^{\small{G}, m}_{\Gamma , \mu }\rbrace ,\! \end{align}\)
    (12e)
    \(\begin{align} ⟦ \phi _1\mathbin {;}\phi _2⟧ ^{\small{G}, m}_{\Gamma , \mu } &:=\lbrace (X_1\cup X_2, \sigma _1\circ \sigma _2)\;|\;(X_1, \sigma _1)\in ⟦ \phi _1⟧ ^{\small{G}, m}_{\Gamma , \mu }\wedge (X_2, \sigma _2)\in ⟦ \phi _2⟧ ^{\small{G}, \max (X_1)+1}_{\Gamma , \mu \circ \sigma _1}\rbrace , \end{align}\)
    (12f)
    \(\begin{align} ⟦ \phi \mathbin {\small {noskip}}\psi ⟧ ^{\small{G}, m}_{\Gamma , \mu } &:=\lbrace (X,\sigma)\in ⟦ \phi ⟧ ^{\small{G}, m}_{\Gamma , \mu }\;|\; [m,\max (X))\cap \textstyle \bigcup \pi _1(⟦ \psi ⟧ ^{\small{G}, m}_{\Gamma , \mu })\subseteq X\rbrace , \end{align}\)
    (12g)
    \(\begin{align} ⟦ \phi {+}⟧ ^{\small{G}, m}_{\Gamma , \mu } &:=\lbrace (\textstyle \bigcup _iX_i, \emptyset)\;|\;\exists _{(k\gt 0, X_1,\ldots ,X_k,\sigma _1,\ldots ,\sigma _k)} (X_1, \sigma _1)\in ⟦ \phi ⟧ ^{\small{G}, m}_{\Gamma , \mu } \nonumber \nonumber\\ &\quad \quad \quad \wedge (X_2, \sigma _2)\in ⟦ \phi ⟧ ^{\small{G}, \max (X_1)+1}_{\Gamma , \mu }\wedge \ldots \wedge (X_k, \sigma _k)\in ⟦ \phi ⟧ ^{\small{G}, \max (X_{k-1})+1}_{\Gamma , \mu } \rbrace , \end{align}\)
    (12h)
    \(\begin{align} &⟦ \phi {+}\small {agr}(a_0:f_0(E_0), a_1:f_1(E_1), \ldots)\mathbin {\small {as}}x⟧ ^{\small{G}, m}_{\Gamma , \mu }\nonumber \nonumber\\ &\quad \quad :=\bigl \lbrace (\textstyle \bigcup _iX_i, \lbrace x/e\rbrace)\;|\;\exists _{(k\gt 0, X_1,\ldots ,X_k,\sigma _1,\ldots ,\sigma _k)} (X_1, \sigma _1)\in ⟦ \phi ⟧ ^{\small{G}, m}_{\Gamma , \mu } \wedge (X_2, \sigma _2)\in ⟦ \phi ⟧ ^{\small{G}, \max (X_1)+1}_{\Gamma , \mu } \nonumber \nonumber\\ &\quad \quad \quad \wedge \ldots \wedge (X_k, \sigma _k)\in ⟦ \phi ⟧ ^{\small{G}, \max (X_{k-1})+1}_{\Gamma , \mu } \nonumber \nonumber\\ &\quad \quad \quad \quad \quad \quad {} \wedge e=\small {agr}\bigl (⟦ f_0⟧ (⟦ E_0⟧ _{\sigma _1,\ldots ,\sigma _k}^\mu), ⟦ f_1⟧ (⟦ E_1⟧ _{\sigma _1,\ldots ,\sigma _k}^\mu), \ldots , \Gamma _{n_k}\mathbin {.}\tau \bigr)\bigr \rbrace . \end{align}\)
    (12i)
    where we denoted \(⟦ E⟧ _{\sigma _1,\ldots ,\sigma _k}^\mu :=[ ⟦ E⟧ _{\mu \circ \sigma _1},⟦ E⟧ _{\mu \circ \sigma _2},\ldots , ⟦ E⟧ _{\mu \circ \sigma _k}]\) . Recall that \(\mathfrak {B}_v(\phi)\) is the set of variables bound by \(\phi\) (see the paragraph after Equation (10)).
    Now we can define semantic equivalence according to the minimal semantics.
    Definition 5.2.
    Two patterns \(\phi _1\) and \(\phi _2\) are said to be equivalent with respect to minimal semantics if and only if \(⟦ \phi _1⟧ ^{\small{min}, m}_{\Gamma , \mu } =⟦ \phi _2⟧ ^{\small{min}, m}_{\Gamma , \mu }\) for all streams \(\Gamma\) , \(m\in \mathbb {N}\) , and all substitutions \(\mu\) with \(\small{free}(\phi _1)\cup \small{free}(\phi _2)\subseteq \small{Dom}(\mu)\) .
    Minimal semantics is strictly weaker than \(⟦ \cdot ⟧ ^{\small{G}, m}_{\Gamma , \mu }\) , i.e., there exist patterns \(\phi _1\) and \(\phi _2\) such that there exist \(\Gamma\) , \(\mu\) and \(m\) such that \(⟦ \phi _1⟧ ^{\small{G}, m}_{\Gamma , \mu } \ne ⟦ \phi _2⟧ ^{\small{G}, m}_{\Gamma , \mu }\) , but \(⟦ \phi _1⟧ ^{\small{min}, m}_{\Gamma , \mu } =⟦ \phi _2⟧ ^{\small{min}, m}_{\Gamma , \mu }\) . Indeed, let \(\phi _1:=R\mathbin {\small {as}}x\) and let \(\phi _2:=(R\mathbin {\small {as}}x\mathbin {;}R\mathbin {\small {as}}x)\mathbin {\small {or}}(R\mathbin {\small {as}}x)\) . We can immediately see that, for any \(\Gamma\) , \(\mu\) and \(m\) , we have that \(⟦ \phi _1⟧ ^{\small{G}, m}_{\Gamma , \mu } \subseteq ⟦ \phi _2⟧ ^{\small{G}, m}_{\Gamma , \mu }\) and that \(⟦ \phi _1⟧ ^{\small{G}, m}_{\Gamma , \mu }\) contains only pairs of the form \((\lbrace n\rbrace ,\lbrace x/\Gamma _n\rbrace)\) . However, \(⟦ \phi _2⟧ ^{\small{G}, m}_{\Gamma , \mu }\) may also contain pairs of the form \((\lbrace k,n\rbrace ,\lbrace x/\Gamma _n\rbrace)\) where \((\lbrace n\rbrace ,\lbrace x/\Gamma _n\rbrace)\in ⟦ \phi _1⟧ ^{\small{G}, m}_{\Gamma , \mu }\) and \(k\lt n\) . It follows that, in general, \(⟦ \phi _1⟧ ^{\small{G}, m}_{\Gamma , \mu } \ne ⟦ \phi _2⟧ ^{\small{G}, m}_{\Gamma , \mu }\) , but \(\phi _1\) and \(\phi _2\) are equivalent with respect to the minimal semantics, i.e., \(⟦ \phi _1⟧ ^{\small{min}, m}_{\Gamma , \mu } =⟦ \phi _2⟧ ^{\small{min}, m}_{\Gamma , \mu }\) for all \(\Gamma\) , \(\mu\) and \(m\) . We believe that the minimal semantics is the correct semantics to use when defining equivalence between patterns.
    In case of some patterns events between start of the matching and the first matched event are irrelevant as being unconditionally skipped. This property is captured by the following recursive definition:
    Definition 5.3.
    We call a pattern unanchored if it is of one of the following forms:
    \(\begin{gather*} R\mathbin {\small {as}}x,\quad R,\quad \phi \mathbin {\small {filter}}P,\quad \phi _1\mathbin {\small {or}}\phi _2,\quad \phi _1\mathbin {\small {and}}\phi _2,\quad \phi \mathbin {;}\phi ^{\prime },\quad \phi {+},\quad \phi {+}\small {agr}(\ldots)\mathbin {\small {as}}x, \end{gather*}\)
    where \(\phi\) , \(\phi _1\) and \(\phi _2\) are unanchored patterns and \(\phi ^{\prime }\) is any pattern. We call a pattern anchored if it is not unanchored.
    In general, parametrization of semantics of patterns by the starting position of matching is not merely required by the inductive definition, but it is integral to their meaning. However, semantics of unanchored pattern \(\phi\) is completely captured by \(⟦ \phi ⟧ ^{\small{min}, 0}_{\Gamma , \mu }\) :
    Lemma 5.4.
    Let \(\phi\) be an unanchored pattern. Then, for each stream \(\Gamma\) , a substitution \(\mu\) such that \(\small{Dom}(\mu)\supseteq \small{free}(\phi)\) , and \(m\in \mathbb {N}\) we have
    \(\begin{equation*} ⟦ \phi ⟧ ^{\small{min}, n}_{\Gamma , \mu }\subseteq ⟦ \phi ⟧ ^{\small{min}, m}_{\Gamma , \mu }\quad \text{for all}\quad n\ge m. \end{equation*}\)
    The lemma can be proven by an immediate induction the details of which are left to the reader. The only subtle case is that of concatenation ( \(;\) ) where it crucially depends on the fact that there is no pattern matching the empty complex event. Unfortunately, not every pattern which satisfies the above inclusion is unanchored, e.g., any pattern of the form \(\phi \mathbin {\small {or}}(\phi \mathbin {\small {noskip}}\psi)\) is anchored, but it still satisfies the conclusion of Lemma 5.4 if \(\phi\) is unanchored.2
    The minimal semantics is the simplest semantics sufficient for validating any implementation of our pattern language. Indeed, the essential thing we care about in any implementation of a pattern language is: “given a stream and start position when the pattern is matched and what events are bound to variables by the pattern?”, and this is precisely what the minimal semantics provides.

    6 Tree Semantics

    In [18, 19] the denotational semantics of patterns is essentially given by the set of sets of positions in the stream, where each set describes matched events. Both articles did not consider negative patterns. Hence, events not matched were irrelevant. The semantics described in [18, 19] give both more and less information than minimal semantics: less because they do not provide start of matching position and first matched event is no proxy for that. More, because they provide positions of matched events. We already explained why disregarding initial unmatched events is wrong in the semantics of a pattern language with negative patterns. However we also want to argue that providing matched events is either excessive or insufficient, depending on what is the purpose of one’s semantics: excessive if one’s purpose is to provide a golden standard with respect to which evaluate the implementations since we do not care about events matched, just when the matching ends and what is the resulting binding of variables. On the other hand, as we will argue later, the set of matched positions is insufficient to provide a satisfactory explanation of the matching which is often needed when we want to verify the correctness of the pattern with respect to its intended meaning. Thus, in this section we introduce another semantics of PatLang called tree semantics which preserves detailed information correlating matched events with subpatterns with which they actually matched. Thus, tree semantics provides information about roles assigned to matched events by the pattern.
    Tree semantics associates with each successful matching of a pattern \(\phi\) with a stream \(\Gamma\) a set of lists of integers called a matching tree. This set consists essentially of positions of subpatterns of the forms \(R\) or \(R\mathbin {\small {as}}x\) in \(\phi\) augmented with
    additional indices \(n\) directly below positions of iteration operator (say, \(\psi +\) or \(\psi {+}\small {agr}(\ldots)\mathbin {\small {as}}x\) ) indicating that this position corresponds to the \(n\) th iterated matching of \(\psi\) ,
    for each position of subpattern of the form \(R\) or \(R\mathbin {\small {as}}x\) there is an additional list element at the end corresponding to the position of event in \(\Gamma\) matched by this subpattern.
    In case of subpatterns of the form \(\phi _1\mathbin {\small {or}}\phi _2\) , associated sets of matching trees contain only positions corresponding to either \(\phi _1\) or \(\phi _2\) , but not both, depending on which alternative is actually matched. In general, matching trees contain only those positions where some event is matched. More precisely, we have
    Definition 6.1.
    A set of matching trees of a pattern \(\phi\) (denoted \(\small{m}_{\small{trees}}(\phi)\) ) is defined by recursion on the structure of \(\phi\) as follows:
    \(\begin{align*} \small{m}_{\small{trees}}(R)&=\small{m}_{\small{trees}}(R\mathbin {\small {as}}x):=\bigl \lbrace \lbrace [n]\rbrace \;|\;n\in \mathbb {N}\bigr \rbrace ,\\ \small{m}_{\small{trees}}(\phi \mathbin {\small {filter}}P)&=\small{m}_{\small{trees}}(\phi \mathbin {\small {noskip}}\psi):=\small{m}_{\small{trees}}(\phi),\\ \small{m}_{\small{trees}}(\phi _1\mathbin {\small {or}}\phi _2)&:=\bigl \lbrace [1]+D\;|\;D\in \small{m}_{\small{trees}}(\phi _1)\bigr \rbrace \cup \bigl \lbrace [2]+D\;|\;D\in \small{m}_{\small{trees}}(\phi _2)\bigr \rbrace ,\\ \small{m}_{\small{trees}}(\phi _1\mathbin {\small {and}}\phi _2)&=\small{m}_{\small{trees}}(\phi _1\mathbin {;}\phi _2)\\ & :=\bigl \lbrace ([1]+D_1)\cup ([2]+D_2)\;|\;D_1\in \small{m}_{\small{trees}}(\phi _1)\wedge D_2\in \small{m}_{\small{trees}}(\phi _2)\bigr \rbrace ,\\ \small{m}_{\small{trees}}(\phi +)&=\small{m}_{\small{trees}}(\phi {+}\small {agr}(\ldots)\mathbin {\small {as}}x)\\ &:=\Bigl \lbrace \textstyle \bigcup _{i=0}^n([i]+D_i)\;\Big |\;n\in \mathbb {N}\wedge \forall _{i\in \lbrace 0,\ldots ,n\rbrace }D_i\in \small{m}_{\small{trees}}(\phi)\Bigr \rbrace . \end{align*}\)
    We flatten the structure of a matching tree at subpatterns of the form \(\phi \mathbin {\small {filter}}P\) and \(\phi \mathbin {\small {noskip}}\psi\) since additional nodes do not provide any useful information.
    For any list \(p\) we denote by \(\lambda (p)\) the last element of \(p\) . Similarly, for a set of lists \(D\) we define \(\lambda (D):=\lbrace \lambda (p)\;|\;p\in D\rbrace\) . In particular, for any matching tree \(D\in \small{m}_{\small{trees}}(\phi)\) , \(\max (\lambda (D))\) is the last position of a matched event and \(\min (\lambda (D))\) is the first such position. For brevity, we denote \(\Lambda (D):=\max (\lambda (D))\) .
    We call elements of \(\small{m}_{\small{trees}}(\phi)\) matching trees since any set of lists can be visualised as a tree where lists are interpreted as paths from the root, the root is labelled by an empty list and other nodes are labelled by list elements. Thus, e.g., any element of \(\small{m}_{\small{trees}}(R\mathbin {\small {as}}x)\) is of the form \(\lbrace [n]\rbrace\) where \(n\in \mathbb {N}\) , which can be visualised as . Similarly, any element of \(\small{m}_{\small{trees}}((R\mathbin {\small {as}}x)\mathbin {;}(S\mathbin {\small {as}}y))\) is of the form \(\lbrace [1, n], [2, m]\rbrace\) , where \(n,m\in \mathbb {N}\) , which can be visualised as . A more complex example is presented below:
    Fig. 7.
    Fig. 7. Example of a matching tree (see Example 6.2).
    Example 6.2.
    Let events of type \(R\) and \(S\) have specifications \(R(\mathit {id}, \tau)\) and \(S(\mathit {id}, t, \tau)\) , respectively. Consider the pattern \(\phi :=(R\mathbin {\small {as}}x)\mathbin {;}\phi ^{\prime }\) , where
    \(\begin{gather*} \phi ^{\prime }:=(S\mathbin {\small {as}}y\mathbin {\small {filter}}x\mathbin {.}\mathit {id}=y\mathbin {.}\mathit {id}){+}\small {agr}(t_{\text{min}}:\text{min}(y\mathbin {.}t))\mathbin {\small {as}}x\mathbin {\small {filter}}x\mathbin {.}t_{\text{min}}\ge 4.0 \end{gather*}\)
    It is easy to see that any element of \(\small{m}_{\small{trees}}(\phi)\) is of the form \(\lbrace [1,n], [2,1,m_1], [2, 2, m_2], \ldots ,\) \([2,k, m_k]\rbrace\) where \(n,k, m_1, m_2, \ldots , m_k\in \mathbb {N}\) . An example of a matching tree in \(\small{m}_{\small{trees}}(\phi)\) is shown in Figure 7. The figure also contains an example event stream \(\Gamma\) , and leaves of a matching tree are connected to appropriate events.
    A matching tree \(D\in \small{m}_{\small{trees}}(\phi)\) associated with a given matching of a stream with a pattern \(\phi\) describes completely the result of this matching. In particular, we can extract from \(D\) values of variables bound by \(\phi\) . We need to define this extraction precisely before we define the tree semantics. Let \(\mu\) be a substitution such that \(\small{free}(\phi)\subseteq \small{Dom}(\mu)\) . We denote by \(\sigma ^{D, \Gamma , \mu }_\phi\) the substitution which describes variable bindings provided by matching tree \(D\) , where the free variables of \(\phi\) are substituted by \(\mu\) . More precisely, \(\small{Dom}(\sigma ^{D, \Gamma , \mu }_\phi)=\mathfrak {B}_v(\phi),\) and, for each \(x\in \mathfrak {B}_v(\phi)\) , \(\sigma ^{D, \Gamma , \mu }_\phi (x)\) is an event in \(\small{Events}_\mathcal {R}\) bound by \(\phi\) during the matching with \(\Gamma\) with which \(D\) is associated. We can define \(\sigma ^{D, \Gamma ,\mu }_\phi\) by recursion on the structure of \(\phi\) as follows:
    \(\begin{gather} \sigma ^{D, \Gamma , \mu }_R=\sigma ^{D, \Gamma , \mu }_{\phi +}:=\emptyset ,\quad \sigma ^{D, \Gamma , \mu }_{\phi \mathbin {\small {filter}}P} = \sigma ^{D, \Gamma , \mu }_{\phi \mathbin {\small {noskip}}\psi } :=\sigma ^{D, \Gamma , \mu }_\phi ,\nonumber \nonumber\\ \sigma ^{([1]+D_1)\cup ([2]+D_2), \Gamma , \mu }_{\phi _1\mathbin {\small {and}}\phi _2} :=\sigma ^{D_1, \Gamma , \mu }_{\phi _1}\circ \sigma ^{D_2, \Gamma , \mu }_{\phi _2},\quad \sigma ^{([1]+D_1)\cup ([2]+D_2), \Gamma , \mu }_{\phi _1\mathbin {;}\phi _2} :=\sigma ^{D_1, \Gamma , \mu }_{\phi _1}\circ \sigma ^{D_2, \Gamma , \mu \circ \sigma ^{D_1, \Gamma , \mu }_{\phi _1}}_{\phi _2},\nonumber \nonumber\\ \sigma ^{\lbrace [i]\rbrace , \Gamma , \mu }_{R\mathbin {\small {as}}x}:=\lbrace x/\Gamma _i\rbrace ,\quad \sigma ^{[i]+D, \Gamma , \mu }_{\phi _1\mathbin {\small {or}}\phi _2} :=\sigma ^{D, \Gamma , \mu }_{\phi _i}\big |_{\mathfrak {B}_v(\phi _1)\cap \mathfrak {B}_v(\phi _2)},\nonumber \nonumber\\ \sigma ^{\bigcup \lbrace [i]+D_i\;|\;i\in \lbrace 0, \ldots , n\rbrace \rbrace , \Gamma , \mu }_{{\phi +}\small {agr}(a_0:f_0(E_0), a_1:f_1(E_1),\ldots)\mathbin {\small {as}}x}:=\bigl \lbrace x/\!\!\small {agr}(⟦ f_0⟧ ([⟦ E_0⟧ ]), ⟦ f_1⟧ ([⟦ E_1⟧ ]), \ldots , \Gamma _{\Lambda (D_n)}\mathbin {.}\tau)\bigr \rbrace , \end{gather}\)
    (13)
    where we denoted for brevity \([⟦ E_i⟧ ]:=[⟦ E_i⟧ _{\mu \circ \sigma ^{D_0, \Gamma , \mu }_\phi }, \ldots , ⟦ E_i⟧ _{\mu \circ \sigma ^{D_n, \Gamma , \mu }_\phi }]\)
    Example 6.3.
    Consider pattern \(\phi\) defined in Example 6.2 and an event stream \(\Gamma\) and matching tree \(D\in \small{m}_{\small{trees}}(\phi)\) presented in Figure 7. It is easy to see that \(\small{free}(\phi)=\emptyset\) (i.e., \(\phi\) is closed). We will compute \(\sigma ^{D, \Gamma , \emptyset }_\phi\) using Equation (13). First, observe that \(D=([1]+D_1)\cup ([2]+D_2)\) where \(D_1=\lbrace [3]\rbrace \in \small{m}_{\small{trees}}(R\mathbin {\small {as}}x)\) and \(D_2=\lbrace [1,5], [2,9], [3,12]\rbrace \in \small{m}_{\small{trees}}(((S\mathbin {\small {as}}y)\ldots){+}\ldots)\) (we replace irrelevant parts with dots for brevity). Thus,
    \(\begin{equation*} \sigma ^{D, \Gamma , \emptyset }_\phi := \sigma ^{D_1, \Gamma , \emptyset }_{R\mathbin {\small {as}}x}\circ \sigma ^{D_2, \Gamma , \emptyset \circ \sigma ^{D_1, \Gamma , \emptyset }_{R\mathbin {\small {as}}x}}_{((S\mathbin {\small {as}}y)\ldots){+}\small {agr}(\ldots)\mathbin {\small {as}}x \mathbin {\small {filter}}\ldots } =\lbrace x/R(1,9)\rbrace \circ \sigma ^{D_2, \Gamma , \lbrace x/R(1,9)\rbrace }_{((S\mathbin {\small {as}}y)\ldots){+}\small {agr}(\ldots)\mathbin {\small {as}}x} \end{equation*}\)
    since \(\Gamma _3=R(1,9)\) and \(\sigma ^{B,\Gamma , \mu }_{\psi \mathbin {\small {filter}}P}=\sigma ^{B,\Gamma , \mu }_{\psi }\) . We further compute
    \(\begin{align*} &\sigma ^{D_2, \Gamma , \lbrace x/R(1,9)\rbrace }_{((S\mathbin {\small {as}}y)\ldots){+}\small {agr}(t_{\text{min}}:\text{min}(y\mathbin {.}t))\mathbin {\small {as}}x} =\sigma ^{\lbrace [1,5], [2,9], [3,12]\rbrace , \Gamma , \lbrace x/R(1,9)\rbrace }_{((S\mathbin {\small {as}}y)\ldots){+}\small {agr}(t_{\text{min}}:\text{min}(y\mathbin {.}t))\mathbin {\small {as}}x}\\ &\quad \quad =\Bigl \lbrace x/\!\!\small {agr}\bigl (⟦ \text{min}⟧ \bigl (\bigl [ ⟦ y\mathbin {.}t⟧ _{\lbrace x/R(1, 9)\rbrace \circ \sigma _{S\mathbin {\small {as}}y}^{\lbrace [5]\rbrace ,\Gamma , \lbrace x/R(1,9)\rbrace }}, ⟦ y\mathbin {.}t⟧ _{\lbrace x/R(1,9)\rbrace \circ \sigma _{S\mathbin {\small {as}}y}^{\lbrace [9]\rbrace ,\Gamma , \lbrace x/R(1,9)\rbrace }},\\ &\quad \quad \quad \quad \quad ⟦ y\mathbin {.}t⟧ _{\lbrace x/R(1,9)\rbrace \circ \sigma _{S\mathbin {\small {as}}y}^{\lbrace [12]\rbrace ,\Gamma , \lbrace x/R(1,9)\rbrace }} \bigr ]\bigr), \Gamma _{12}\mathbin {.}\tau \bigr)\Bigr \rbrace \\ &\quad \quad =\Bigl \lbrace x/\!\!\small {agr}\bigl (⟦ \text{min}⟧ \bigl (\bigl [ ⟦ y\mathbin {.}t⟧ _{\lbrace x/R(1,9),y/S(1, 5.0, 16)\rbrace }, ⟦ y\mathbin {.}t⟧ _{\lbrace x/R(1,9),y/S(1,4.1, 30)\rbrace },\\ &\quad \quad \quad \quad \quad ⟦ y\mathbin {.}t⟧ _{\lbrace x/R(1,9),y/S(1, 4.5, 49)\rbrace } \bigr ]\bigr)\bigr), 49\Bigr \rbrace \\ &\quad \quad =\Bigl \lbrace x/\!\!\small {agr}\bigl (⟦ \text{min}⟧ \bigl (\bigl [ 5.0,4.1,4.5 \bigr ]\bigr), 49\bigr)\Bigr \rbrace =\Bigl \lbrace x/\!\!\small {agr}\bigl (4.1, 49\bigr)\Bigr \rbrace . \end{align*}\)
    Thus, finally \(\sigma ^{D, \Gamma , \emptyset }_\phi :=\lbrace x/R(1,9)\rbrace \circ \bigl \lbrace x/\!\!\small {agr}(4.1,49)\bigr \rbrace =\bigl \lbrace x/\!\!\small {agr}(4.1,49)\bigr \rbrace .\)
    Let \(\phi\) be a pattern. We define a denotational semantics \(⟦ \phi ⟧ _{\Gamma ,\mu }^m\subseteq \small{m}_{\small{trees}}(\phi)\) of \(\phi\) to be a set of possible matchings of a stream with \(\phi\) , where each matching is represented by a matching tree in \(\small{m}_{\small{trees}}(\phi)\) . The semantics has three parameters:
    an ordered stream \(\Gamma :\mathbb {N}\rightarrow \small{Events}_\mathcal {U}\) (c.f. Definition 4.1) with which we match \(\phi\) ,
    a natural number \(m\in \mathbb {N}\) representing a position in \(\Gamma\) from which we start matching with \(\phi\)
    a substitution \(\mu\) such that \(\small{free}(\phi)\subseteq \small{Dom}(\mu)\) which fixes the values of free variables in \(\phi\) .
    The set \(⟦ \phi ⟧ _{\Gamma ,\mu }^m\) is defined by recursion on the structure of \(\phi\) as follows:
    \(\begin{align} ⟦ R\mathbin {\small {as}}x⟧ _{\Gamma ,\mu }^m&=⟦ R⟧ _{\Gamma ,\mu }^m :=\bigl \lbrace \lbrace [n]\rbrace \;|\;n\in \mathbb {N}\wedge n\ge m\wedge \small{Type}(\Gamma _n, R)\bigr \rbrace , \end{align}\)
    (14a)
    \(\begin{align} ⟦ \phi \mathbin {\small {filter}}P⟧ _{\Gamma ,\mu }^m&:=\bigl \lbrace D\in ⟦ \phi ⟧ _{\Gamma , \mu }^m\;|\; ⟦ P ⟧ _{\mu \circ \sigma ^{D, \Gamma , \mu }_\phi } \bigr \rbrace \end{align}\)
    (14b)
    \(\begin{align} ⟦ \phi _1\mathbin {\small {or}}\phi _2⟧ _{\Gamma ,\mu }^m&:= \bigl \lbrace [1]+D\;|\; D\in ⟦ \phi _1⟧ _{\Gamma ,\mu }^m\bigr \rbrace \cup \bigl \lbrace [2]+D\;|\; D\in ⟦ \phi _2⟧ _{\Gamma ,\mu }^m\bigr \rbrace , \end{align}\)
    (14c)
    \(\begin{align} ⟦ \phi _1\mathbin {\small {and}}\phi _2⟧ _{\Gamma ,\mu }^m&:= \bigl \lbrace ([1]+D_1)\cup ([2]+D_2)\;|\;D_1\in ⟦ \phi _1⟧ _{\Gamma , \mu }^m \wedge D_2\in ⟦ \phi _2⟧ _{\Gamma , \mu }^{m}\bigr \rbrace , \end{align}\)
    (14d)
    \(\begin{align} ⟦ \phi _1\mathbin {;}\phi _2⟧ _{\Gamma ,\mu }^m&:= \bigl \lbrace ([1]+D_1)\cup ([2]+D_2)\;|\;D_1\in ⟦ \phi _1⟧ _{\Gamma , \mu }^m \wedge D_2\in ⟦ \phi _2⟧ _{\Gamma , \mu \circ \sigma ^{D_1, \Gamma , \mu }_{\phi _1}}^{\Lambda (D_1)+1}\bigr \rbrace , \end{align}\)
    (14e)
    \(\begin{align} ⟦ \phi \mathbin {\small {noskip}}\psi ⟧ _{\Gamma ,\mu }^m &:=\bigl \lbrace D_1\in ⟦ \phi ⟧ _{\Gamma , \mu }^m\;|\; [m, \Lambda (D_1))\cap \lambda \bigl (\bigcup ⟦ \psi ⟧ _{\Gamma , \mu }^m\bigr)\subseteq \lambda (D_1)\bigr \rbrace , \end{align}\)
    (14f)
    \(\begin{align} ⟦ {\phi +}⟧ _{\Gamma , \mu }^m&=⟦ {\phi +}\small {agr}(\ldots)\mathbin {\small {as}}x⟧ _{\Gamma , \mu }^m :=\bigcup _{n=0}^\infty \Psi _{\Gamma , \mu }^{n,m}(\phi). \end{align}\)
    (14g)
    Above, \(\Psi _{\Gamma , \mu }^{n,m}(\phi)\) is a helper function which expresses results of exactly \(n\) iterations of \(\phi\) . Thus,
    \(\begin{equation} \Psi _{\Gamma , \mu }^{n,m}(\phi):=\bigl \lbrace \textstyle \bigcup _{i=0}^n([i]+D_i)\;\big |\; D_0\in ⟦ \phi ⟧ _{\Gamma , \mu }^{m}\wedge D_1\in ⟦ \phi ⟧ _{\Gamma , \mu }^{\Lambda (D_0)+1}\wedge \cdots \wedge D_n\in ⟦ \phi ⟧ _{\Gamma , \mu }^{\Lambda (D_{n-1})+1}\bigr \rbrace . \end{equation}\)
    (15)
    We leave the proof of the following result about unanchored patterns (Definition 5.3) to the reader:
    Lemma 6.4.
    For any unanchored pattern \(\phi\) , each stream \(\Gamma\) , a substitution \(\mu\) such that \(\small{Dom}(\mu)\supseteq \small{free}(\phi)\) , and \(m\in \mathbb {N}\) we have \(⟦ \phi ⟧ ^{n}_{\Gamma , \mu }\subseteq ⟦ \phi ⟧ ^{m}_{\Gamma , \mu }\) for all \(n\ge m\) .
    Finally, let us note the following simple result which indicates that for closed patterns (i.e., patterns where \(\small{free}(\phi)=\emptyset\) ) substitution \(\mu\) is irrelevant for semantics:
    Lemma 6.5.
    Let \(\phi\) be a pattern. Then for any stream \(\Gamma\) , any substitution \(\mu\) , and \(m\in \mathbb {N}\) we have \(⟦ \phi ⟧ _{\Gamma , \mu }^m=⟦ \phi ⟧ _{\Gamma , \mu |_{\small{free}(\phi)}}^m\) . In particular, if \(\phi\) is closed, then \(⟦ \phi ⟧ _{\Gamma , \mu }^m=⟦ \phi ⟧ _{\Gamma , \emptyset }^m\) .
    Example 6.6.
    Consider closed pattern \(\phi\) ( \(\small{free}(\phi)=\emptyset\) ) defined in Example 6.2 and an event stream \(\Gamma\) and matching tree \(D\in \small{m}_{\small{trees}}(\phi)\) presented in Figure 7. We will prove that \(D\in ⟦ \phi ⟧ ^0_{\Gamma ,\emptyset }\) . Denote for brevity
    \(\begin{align*} \psi _1&:=\psi _2\mathbin {\small {filter}}x\mathbin {.}t_{\text{min}}\ge 4.0,\\ \psi _2&:=\psi _3{+}\small {agr}(t_{\text{min}}:\text{min}(y\mathbin {.}t))\mathbin {\small {as}}x,\\ \psi _3&:=S\mathbin {\small {as}}y\mathbin {\small {filter}}x\mathbin {.}\mathit {id}=y\mathbin {.}\mathit {id}. \end{align*}\)
    Thus, \(\phi =(R\mathbin {\small {as}}x)\mathbin {;}\psi _1\) . First, since \(D=([1]+D_1)\cup ([2]+D_2)\) where \(D_1=\lbrace [3]\rbrace \in \small{m}_{\small{trees}}(R\mathbin {\small {as}}x)\) and \(D_2=\lbrace [1,5], [2,9], [3,12]\rbrace \in \small{m}_{\small{trees}}(\psi _1)\) , it suffices, by Equation (14e), to show that \(D_1\in ⟦ R\mathbin {\small {as}}x⟧ _{\Gamma , \emptyset }^0\) and \(D_2\in ⟦ \psi _1⟧ _{\Gamma , \sigma ^{D_1, \Gamma , \emptyset }_{R\mathbin {\small {as}}x}}^{\Lambda (D_1)+1}\) . Now, \(D_1=\lbrace [3]\rbrace\) and \(\Gamma _3=R(3,9)\) . Hence, the first of these conditions is satisfied by Equation (14a). To prove the second assertion (about \(D_2\) ) recall from Example 6.3 that \(\sigma ^{D_1, \Gamma ,\emptyset }_{R\mathbin {\small {as}}x}=\lbrace x/R(1,9)\rbrace\) and \(\sigma ^{D_2, \Gamma , \sigma ^{D_1, \Gamma ,\emptyset }_{R\mathbin {\small {as}}x}}_{\psi _2}=\lbrace x/\!\!\small {agr}(4.1,49)\rbrace\) . Then, by Equation (14b), \(D_2\in ⟦ \psi _1⟧ _{\Gamma , \sigma ^{D_1, \Gamma , \emptyset }_{R\mathbin {\small {as}}x}}^{\Lambda (D_1)+1}\) iff \(D_2\in ⟦ \psi _2⟧ _{\Gamma , \sigma ^{D_1, \Gamma , \emptyset }_{R\mathbin {\small {as}}x}}^{\Lambda (D_1)+1}\) , since \(⟦ x\mathbin {.}t_{\text{min}}\ge 4.0⟧ _{\sigma ^{D_1, \Gamma ,\emptyset }_{R\mathbin {\small {as}}x}\circ \sigma ^{D_2, \Gamma , \sigma ^{D_1, \Gamma ,\emptyset }_{R\mathbin {\small {as}}x}}_{\psi _2}}= ⟦ x\mathbin {.}t_{\text{min}}\ge 4.0⟧ _{\lbrace x/R(1,9)\rbrace \circ \lbrace x/\!\!\small {agr}(4.1,49)\rbrace } =4.1\ge 4.0\) is satisfied. Because \(D_1=\lbrace [3]\rbrace\) and \(D_2=\lbrace [1,5], [2,9], [3,12]\rbrace\) , the unravelling of the iteration in Equation (15) yields that \(D_2\in ⟦ \psi _2⟧ _{\Gamma , \sigma ^{D_1, \Gamma , \emptyset }_{R\mathbin {\small {as}}x}}^{\Lambda (D_1)+1} =⟦ \psi _2⟧ _{\Gamma , \lbrace x/R(1,9)\rbrace }^{4}\) iff
    \(\begin{gather*} \lbrace [5]\rbrace \in ⟦ \psi _3⟧ ^4_{\Gamma , \lbrace x/R(1,9)\rbrace }\wedge \lbrace [9]\rbrace \in ⟦ \psi _3⟧ ^6_{\Gamma , \lbrace x/R(1,9)\rbrace }\wedge \lbrace [12]\rbrace \in ⟦ \psi _3⟧ ^{10}_{\Gamma , \lbrace x/R(1,9)\rbrace }. \end{gather*}\)
    We leave the rest of the verification to the reader.

    6.1 Relationship between Various Semantics

    Each element of minimal semantics of a pattern consists of a position of a last matched event and substitution mapping variables to events bound to them during corresponding matching. Both pieces of data can be easily extracted from a matching tree. Hence, there is an obvious way to map tree semantics into simple semantics: We map each matching tree from the tree semantics of a pattern \(\phi\) into a pair consisting of \(\Lambda (D)\) (the index of largest matched event) and associated substitution. One can prove that we can obtain minimal semantics of \(\phi\) in this way. Moreover, it is also clear that the mapping factors through the “intermediate” semantics \(⟦ \cdot ⟧ ^{\small{G}, m}_{\Gamma , \mu }\) (cf. [18]). More precisely, we can prove the following theorem:
    Theorem 6.7.
    Let \(\phi\) be a pattern in PatLang, \(\Gamma\) be a stream and \(\mu\) be a substitution such that \(\small{free}(\phi)\subseteq \small{Dom}(\mu)\) . For any \(D\in \small{m}_{\small{trees}}(\phi)\) define \(\Xi ^{\Gamma , \mu }_\phi (D):=(\lambda (D), \sigma ^{D, \Gamma , \mu }_\phi)\) . Then, for any \(m\in \mathbb {N}\) we have, \(⟦ \phi ⟧ ^{\small{G}, m}_{\Gamma , \mu }=\Xi ^{\Gamma , \mu }_\phi (⟦ \phi ⟧ _{\Gamma ,\mu }^m)\) . In particular, \(⟦ \phi ⟧ ^{\small{min}, m}_{\Gamma , \mu } =\lbrace (\max (X),\sigma)\;|\;(X,\sigma)\in \Xi ^{\Gamma , \mu }_\phi (⟦ \phi ⟧ _{\Gamma ,\mu }^m)\rbrace\) .
    The theorem can be proven by induction on the structure of \(\phi\) . The full proof can be found in the Appendix A.1.

    7 Why We Need Tree Semantics

    This section contains examples demonstrating why the minimal semantics, like other semantics from prior articles [18], is insufficient to verify patterns against their intended meaning, and, if necessary, to correct them. More precisely, we consider examples where two matchings of the same events in the same stream (i.e., matchings which are differentiated neither by minimal semantics nor by the semantics from Equation (12), cf. [18]) are nevertheless differentiated by the tree semantics. We start with an abstract example:
    Fig. 8.
    Fig. 8. Example of a matching tree (see Example 7.1).
    Fig. 9.
    Fig. 9. Example of a matching tree (see Example 7.1).
    Example 7.1.
    Consider the pattern \(\phi :=((R\mathbin {\small {as}}x\mathbin {\small {filter}}x\mathbin {.}a\gt 0){+})\mathbin {;}(R{+})\) , and two of its matching trees (for the same stream) presented in Figures 8 and 9. Matchings described by those matching trees match exactly the same events. However, the event at position 5 in the stream is matched by subpattern \(R{+}\) in the matching tree in Figure 8, and by subpattern \((R\mathbin {\small {as}}x\mathbin {\small {filter}}x\mathbin {.}a\gt 0){+}\) in the matching tree in Figure 9. Thus, while we match the same events, we assign distinct meaning to them in the tree semantics.
    Now we give another, this time concrete example:
    Example 7.2.
    Here we use an event schema from our motivating example from Section 2. Suppose we want to create a pattern recognizing the following situation: A given shelf (say, with identifier 1) displays a sufficiently low temperature for some period. Then it reports a series of items being taken out of this shelf, which implies it is opened frequently and possibly not closed quickly enough. At the same time it starts to report temperature higher than it is permitted for this shelf, so we want to alert the operator. Let our first attempt to describe this situation with PatLang be as follows:
    \(\begin{align*} \phi _1&:=(\psi _1\mathbin {\small {filter}}x\mathbin {.}\mathit {avg}\lt 10)\mathbin {;}(\\ &\quad \quad \quad (\psi _1\mathbin {\small {filter}}x\mathbin {.}\mathit {avg}\gt 12)\mathbin {\small {and}}(\text{get}\mathbin {\small {as}}y\mathbin {\small {filter}}y\mathbin {.}\mathit {sid} = 1){+}), \end{align*}\)
    where \(\psi _1:=(\text{shelf}\mathbin {\small {as}}x \mathbin {\small {filter}}x\mathbin {.}\mathit {sid} = 1){+} \small {agr}(\mathit {avg}:\mathbf {avg}(x\mathbin {.}\mathit {temp}))\mathbin {\small {as}}x\) . There are many things wrong with this solution, but we will concentrate on just one aspect of it. Suppose that we are using a tool which automatically generates example stream fragments matching the pattern. Suppose also, that this tool, like the one in [25], only annotates which events were matched, but not by which subpattern (thus, the tool is compatible with denotational semantics in the style of [18]). Let one of the matching stream fragments generated by the tool be as follows (matched events are coloured red, and we denote this stream as \(\Gamma\) in the remainder of this example):
    Clearly, this stream fragment which matches \(\phi _1\) indicates a slightly different situation than intended: we want casual link between taking stuff from the shelf and temperature increase. However, above, temperature in shelf 1 started increasing (above 10) even before the first get-event. We think that we recognized the error in \(\phi _1\) : we averaged temperatures, which means that if in the first period temperature was mostly very low, the average will be still below 10 even if it increased above 10. Thus we change averages in the first aggregation to maximum
    \(\begin{equation*} \phi _2:=\psi _2\mathbin {;} (\psi _3\mathbin {\small {and}}(\text{get}\mathbin {\small {as}}y\mathbin {\small {filter}}y\mathbin {.}\mathit {sid} = 1){+}), \end{equation*}\)
    where
    \(\begin{align*} \psi _2&:= ((\text{shelf}\mathbin {\small {as}}x \mathbin {\small {filter}}x\mathbin {.}\mathit {sid} = 1){+} \small {agr}(\mathit {max}:\mathbf {max}(x\mathbin {.}\mathit {temp}))\mathbin {\small {as}}x) \mathbin {\small {filter}}x\mathbin {.}\mathit {max}\lt 10,\\ \psi _3&:=((\text{shelf}\mathbin {\small {as}}x \mathbin {\small {filter}}x\mathbin {.}\mathit {sid} = 1){+} \small {agr}(\mathit {avg}:\mathbf {avg}(x\mathbin {.}\mathit {temp}))\mathbin {\small {as}}x) \mathbin {\small {filter}}x\mathbin {.}\mathit {avg}\gt 12. \end{align*}\)
    Fig. 10.
    Fig. 10. Example of a matching tree (see Example 7.2).
    Fig. 11.
    Fig. 11. Example of a matching tree (see Example 7.2).
    However, the above stream fragment still matches \(\phi _2\) . Had the testing tool for the patterns provided full matching trees (i.e., our tree semantics) instead of just matched events, the reason would be obvious. Consider two matching trees in Figures 10 and 11 corresponding to matching of \(\phi _1\) with the above stream \(\Gamma\) with the identical sets of matched events. The only difference is that event 6 in the matching tree in Figure 10 is interpreted as matched by \(\psi _1\mathbin {\small {filter}}\ldots\) -part of \(\phi _1\) in the first component of sequencing operator and the same event in the matching tree in Figure 11 is interpreted as matched by \(\psi _1\mathbin {\small {filter}}\ldots\) -part of \(\phi _1\) of the second component of sequencing operator. The first tree corresponds to our initial assessment of the source of error in the pattern \(\phi _1\) (relative to its intended interpretation), and indeed this matching tree does not belong to the tree semantics of \(\phi _2\) . However, the second matching tree (Figure 11) still belongs to \(⟦ \phi _2⟧ _{\Gamma , \emptyset }^0\) . Here, the error is that with patterns of the form \(\alpha _1\mathbin {\small {and}}\alpha _2\) we have no control over relative positions of events matched by \(\alpha _1\) and \(\alpha _2\) . Thus, the reporting of increased temperature can start before the first get-event.
    Finally, let us consider the following example (abstract to keep it simple) showing that, unlike our tree semantics, the rich auxiliary semantics of [20] (the one which associates a set of variable substitutions to a pattern) does not always provide sufficient information to verify the pattern against its intended meaning.
    Example 7.3.
    Assume that an event schema contains event type specifications \(R(a)\) and \(S(b)\) . Consider the pattern \(\phi :=\bigl ((R+)\;\small{all}\;S\bigr)+\) in the language of [20], and the following event stream \(\Gamma\) :
    \(S(1)\) \(R(2)\) \(R(1)\) \(S(2)\) \(R(3)\) \(R(4)\) \(R(1)\) \(S(3)\) \(R(1)\) \(R(5)\) \(R(2)\) ...
    1234567891011...
    It is immediate from [20, Figure 3] that \(\lbrace S/\lbrace 1,4,8\rbrace , R/\lbrace 2,3,5,6,7,9,10,11\rbrace \rbrace \in \lceil{\!\!}\lceil \phi \rfloor{\!\!}\rfloor (\Gamma , 1,11)\) . Unfortunately, this loses information about details of the matching, which, as in the examples above, may matter in some situations, e.g., the above matching can be interpreted as the following three consecutive matchings of \(\phi ^{\prime }:=(R+)\;\small{all}\;S\) :
    \(\begin{gather*} \lbrace S/\lbrace 1\rbrace , R/{2,3}\rbrace \in \lceil{\!\!}\lceil \phi ^{\prime }\rfloor{\!\!}\rfloor (\Gamma , 1,3),\quad \lbrace S/\lbrace 4\rbrace , R/\lbrace 5,6,7\rbrace \rbrace \in \lceil{\!\!}\lceil \phi ^{\prime }\rfloor{\!\!}\rfloor (\Gamma , 4,7),\\ \lbrace S/\lbrace 8\rbrace , R/\lbrace 9,10,11\rbrace \rbrace \in \lceil{\!\!}\lceil \phi ^{\prime }\rfloor{\!\!}\rfloor (\Gamma , 8,11), \end{gather*}\)
    as well as the following three matchings of \(\phi ^{\prime }\) :
    \(\begin{gather*} \lbrace S/\lbrace 1\rbrace , R/{2}\rbrace \in \lceil{\!\!}\lceil \phi ^{\prime }\rfloor{\!\!}\rfloor (\Gamma , 1,2),\quad \lbrace S/\lbrace 4\rbrace , R/\lbrace 3,5,6\rbrace \rbrace \in \lceil{\!\!}\lceil \phi ^{\prime }\rfloor{\!\!}\rfloor (\Gamma , 3,6),\\ \lbrace S/\lbrace 8\rbrace , R/\lbrace 7,9,10,11\rbrace \rbrace \in \lceil{\!\!}\lceil \phi ^{\prime }\rfloor{\!\!}\rfloor (\Gamma , 7,11), \end{gather*}\)

    8 Complexity Issues and Evaluation

    8.1 Complexity and Selection Strategies

    As it is well recognized (see e.g., [29] for a detailed analysis) there is an inherent exponential complexity in matching patterns which support both the unrestricted skipping of relevant and irrelevant events and iteration. Consider, e.g., the following pattern:
    \(\begin{equation*} \phi _1:=(\mathit {put} \mathbin {\small {as}}x);(\phi _0{+} \small {agr}()\mathbin {\small {as}}z) \mathbin {\small {filter}}z.\tau \lt x.\tau +20, \end{equation*}\)
    where \(\phi _0:=\mathit {shelf}\mathbin {\small {as}}y \mathbin {\small {filter}}x.\mathit {id}=y.\mathit {id}\wedge y.\mathit {temp}\gt 10\) . For each \(\mathit {put}\) -event \(x\) in the event stream if there are \(n\) \(\mathit {shelf}\) -events \(y\) such that \(x.\mathit {id}=y.\mathit {id}\) , \(y.\mathit {temp}\gt 10\) , and \(z.\tau \lt x.\tau +20\) , then the matching would generate \(2^n-1\) solutions. This can easily become intractable in either memory or processing steps required, for even modest values of \(n\) . Moreover, even if the implementation is made to share the sub-matchings as much as possible (which helps, but resources needed are still exponential in the general case, cf. [28]), the “client” of the matcher will still be overwhelmed by exponentially many returned complex events. Hence, we need to restrict the number of potential matchings through the use of selection strategies. A frequently used selection strategy is skip-till-next-match ([29], cf. NXT strategy from [18]) which permits skipping only of irrelevant (at the moment) events, and greedily matches those events which can be matched. Details depend on the author and pattern language (c.f., [18, 29]). In case of PatLang we can enforce skip-till-next-match selection strategy using the \(\mathbin {\small {noskip}}\) construct. For example, we can rewrite \(\phi _1\) as follows:
    \(\begin{align*} \phi _2:=&(\mathit {put} \mathbin {\small {as}}x \mathbin {\small {noskip}}\mathit {put} \mathbin {\small {as}}x^{\prime });((\phi _0{+} \small {agr}()\mathbin {\small {as}}z \mathbin {\small {noskip}}\phi _0)\mathbin {\small {filter}}z.\tau \lt x.\tau +20). \end{align*}\)
    Observe that now, for each \(\mathit {put}\) -event \(x\) in the event stream, if there are \(n\) \(\mathit {shelf}\) -events \(y\) such that \(x.\mathit {id}=y.\mathit {id}\) , \(y.\mathit {temp}\gt 10\) , and \(y.\tau \lt x.\tau +20\) , then the matching would generate \(n\) complex events.
    We call a pattern \(\phi\) deterministic if, after the first matched event, the matching of \(\phi\) is completely determined by the events in the stream. Thus, for a given stream, each potential matching of a deterministic pattern can be extended to at most one complete matching (or be proven to be a dead end). Consequently, for unanchored, deterministic pattern \(\phi\) , if there are on average \(n\) events matching the start of the pattern emitted within the length of the time window of \(\phi\) , then matching \(\phi\) we have to deal with at most \(n\) -simultaneous potential matchings [29], which is, at least theoretically, manageable. Unfortunately, expressive pattern languages usually permit creation of non-deterministic patterns. In case of PatLang and many similar pattern languages (see e.g., [18, 28, 29]) there are three basic sources of non-determinism: skipping (as described above), alternative (_ \(\mathbin {\small {or}}\_\) ) and iteration ( \((\cdot){+}\) and \((\cdot){+} \small {agr}(\ldots)\mathbin {\small {as}}x\) ). Patterns with the alternative can be deterministic: Consider, e.g., the pattern \((A \mathbin {\small {or}}B) \mathbin {\small {noskip}}(A\mathbin {\small {or}}B)\) . However, in this case making pattern deterministic may lead to unacceptable loss of matches. For example, let \(\psi _1:=(A\mathbin {;}C)\mathbin {\small {or}}(B\mathbin {;}D)\) and \(\psi _2:=(A\mathbin {\small {or}}B)\mathbin {\small {or}}(C\mathbin {\small {or}}D)\) . Then non-deterministic \(\psi _1\) matches stream fragment \([A, B, C]\) while deterministic \(\psi _1\mathbin {\small {noskip}}\psi _2\) does not, which is probably not the intention. On the other hand, the pattern \(\psi _3:=(A\mathbin {;} (C\mathbin {\small {noskip}}C))\mathbin {\small {or}}(B\mathbin {;} (D\mathbin {\small {noskip}}D))\) has two distinct matches with the stream fragment \([A,B,C,D]\) . Note that \(\psi _3\) is deterministic in each branch but the branches can still be chosen non-deterministically. This can lead again to exponential explosion of the number of matchings when combined with iteration (e.g., \(\psi _3{+}\) ).
    In case of iteration, the source of non-determinism is the decision when to stop. It is not unreasonable to want to report the complex event as soon as possible (c.f., LAST selection strategy in [18]). In particular, we might want to stop the given iteration as soon as possible. What “as soon as possible” means depends on the situation. Sometimes it is simple: if we want to match “at least three occurrences of \(\phi\) ” then we can stop immediately after the third occurrence. There is no need to extend the iteration further, unless we want to know exactly how many occurrences of \(\phi\) were within the time window, in which case some sort of “maximal” strategy would be called for (cf. MAX strategy in [18]). For example, in order to report “exact number of at least three occurrences of \(R\) -events with 20 units of time” we can use the following deterministic pattern:
    \(\begin{align*} &(R\mathbin {\small {as}}x)\mathbin {;}(((R\mathbin {\small {as}}y\mathbin {\small {filter}}y.\tau \lt x.\tau +20) \mathbin {\small {noskip}}(R\mathbin {\small {as}}z\mathbin {\small {filter}}z.\tau \ge x.\tau +20)){+}\\ &\quad \quad \quad \small {agr}(c:\mathbf {count}(*))\mathbin {\small {as}}t \mathbin {\small {filter}}t.c\gt 1)\\ &\quad \mathbin {;} ((?\mathbin {\small {as}}s \mathbin {\small {filter}}s.\tau \ge x.\tau +20)\mathbin {\small {noskip}}(?\mathbin {\small {as}}s \mathbin {\small {filter}}s.\tau \ge x.\tau +20)) \end{align*}\)
    In general, making pattern deterministic or even stopping the iteration “as soon as possible” may not be easy or may not be possible at all. Sometimes the situation to match is complex enough that it may require non-deterministically exploring many paths before correct matching is found. In those cases, even if the user is not reported a large number of complex events (which is usually not desirable), the implementation may have to generate a large number of partial matchings, most of which will be rejected. We believe that this is unavoidable unless one is willing to severely reduce the expressibility of the pattern language. However, as we demonstrated above, in many cases PatLang permits the rewriting of patterns in such a way that they become deterministic enough to avoid exponential explosion of solutions and to make practical applications viable. This, of course, further underlies the need for tools for automated exploration and testing of patterns such as [30] described briefly in the next subsection.

    8.2 PatternExplorer Tool

    We implemented in Prolog a tool for exploration and testing of patterns called PatternExplorer (the sources and extended documentation are available from [30]). The tool implements PatLang with some inconsequential changes to the syntax in accordance with the tree semantics. We intend to prove correctness of the implementation in the future work. The tool generates partially instantiated (symbolic) streams. More precisely, they contain either events with partially instantiated and constrained attributes or constrained variables representing skipped events. In the latter case constraints encode which events can be skipped. More interestingly, PatternExplorer permits finding streams which match two patterns at the same time (with a possibility of correlating both patterns by passing variables bound by the first pattern to the second one). If we express using the second pattern some arrangement of events which we do not want to co-occur with the matching of the first pattern, and we find streams which satisfy both patterns, then we can infer that the first pattern was faulty. On the other hand, if we find no streams satisfying both patterns then this is usually just a partial verification, since to keep the execution finite we always limit the size of constructed stream fragments. The generated matching stream fragment has for each matched event the information about which part of pattern(s) it matched. In the web GUI of the tool (see Figure 12) this information is presented in a human readable form as an appropriate pattern fragment for each matched event. In the API the matching predicate effectively returns a representation of a matching tree.
    Fig. 12.
    Fig. 12. Screenshot from the web interface of PatternExplorer tool presenting partially instantiated stream matching both pattern 31 and pattern 32 (presented below the screenshot). As patterns are represented by actual Prolog terms we introduced some syntax changes (e.g., \(x.\mathit {id}\) is represented by term ref(X,id), driver_in \(\mathbin {\small {as}}\) x is represented by event(driver_in, X), \(\mathbin {\small {filter}}\) is no longer infix, and so on. The box at the bottom contains constraints attached to variables in the symbolic stream.

    8.3 Experimental Evaluation

    For evaluation we consider a public transport management system (cf. [4]) in which the schema of “bus” events generated by buses and sensors placed along the routes contains (among others) the following event type specifications:
    \(\mathit {enter}(\mathit {id}, \mathit {sid}, \delta , \tau)\) / \(\mathit {leave}(\mathit {id},\mathit {sid},\delta , \tau)\) (the bus \(\mathit {id}\) enters/leaves stop identified by \(\mathit {sid}\) at time \(\tau\) , where scheduled time is \(\tau -\delta\) ),
    abrupt_accel \((\mathit {id}, \tau)\) / abrupt_decel \((\mathit {id}, \tau)\) (abrupt acceleration/deceleration of bus \(\mathit {id}\) at \(\tau\) ),
    sharp_turn \((\mathit {id}, d, \tau)\) (bus \(\mathit {id}\) made a sharp turn to \(d\in \lbrace \text{left}, \text{right}\rbrace\) at \(\tau\) ),
    driver_in \((\mathit {id}, \mathit {did}, \tau)\) and driver_out \((\mathit {id}, \mathit {did}, \tau)\) (driver identified by \(\mathit {did}\) signs in/out of the bus \(\mathit {id}\) at time \(\tau\) ).
    Figure 13 contains two patterns parametrized by \(p\) and \(\Delta\) which are supposed to match a situation where a bus arrives late at the bus station (by at least 120 time units), and then, before arriving at the next stop, the driver engages in dangerous driving defined as \(p\) incidents of either abrupt acceleration, deceleration or sharp turns within \(\Delta\) time units. Both patterns try to ensure that the driver who engages in dangerous driving is the same driver who arrived late. The patterns in Figure 13 are the same, except that
    (1)
    Pattern 2 prevents skipping of relevant abrupt_accel, abrupt_decel, and sharp_turn-events inside iteration, thus forcing skip-till-next-match selection strategy (skipping of other “matchable” events is forbidden in both patterns).
    (2)
    Pattern 2 also does not allow skipping of any events later than the time window inside the iteration in order to prevent the continuation of iteration step outside the time window.
    Fig. 13.
    Fig. 13. Patterns used for evaluation ( \(p\) and \(\Delta\) are parameters), where, to make the formula more readable, we denoted, for any variable \(x\) , \(\mathbf {danger}(x):=({abrupt_accel}\mathbin {\small {as}}x)\mathbin {\small {or}}({abrupt_decel}\mathbin {\small {as}}x)\mathbin {\small {or}}({sharp_turn}\mathbin {\small {as}}x)\) .
    Matching with the second pattern should be more efficient and scalable. Also, the second pattern better captures the intention of discovering “dangerous driving incidents after being late to the bus stop”, as it avoids reporting spurious complex events corresponding to different choices of “dangerous driving” indicators in the same dangerous driving instance. To evaluate the effect the differences between Pattern 1 and Pattern 2 have on matching time we used the following experimental setup:
    We used late 2013 model of MacBook Pro, 13 inch, with 16 GB RAM and 2.8 GHz dual core Intel i7 processor.
    PatternExplorer compiles patterns to special non-deterministic automata with variables (cf., [18], [25]). Since our tool was designed to create symbolic streams which match the given pattern instead of actually matching pattern with a stream, our execution engine is overly complicated with some additional branches created to narrow and constrain variables instead of simply consuming events. Hence, for the purpose of this test, we created a simple, basic implementation of automaton execution engine which expects only fully instantiated streams.
    For each pattern we selected a number of combinations of required iterations \(p\) and stream size. We chose \(\Delta\) (time window size) proportional to \(p\) (either \(20p\) or \(40p\) ) to ensure that in most cases we have a large enough time window to fit the required number of iteration steps.
    For each choice of pattern, \(p\) , \(\Delta\) and stream length, ten streams of this length were randomly generated. The process of random stream generation was as follows:
    (1)
    First, we draw randomly the event type.
    (2)
    Then we draw values of all non-time attributes of the selected event. Each attribute name has a specific small set from which those attributes are drawn ( \(\mathit {id}, \mathit {sid},\mathit {did}\in \lbrace 1,2,3,4\rbrace\) , \(\delta \in [1,200]\) , \(\mathit {dir}\in \lbrace \text{left}, \text{right}\rbrace\) ). An \(i\) th event \(E_i\) is assigned timestamp \(\tau _i:=\tau _{i-1}+d\) , where \(d\) is drawn from \([1,20]\) .
    For each generated stream we run the matching with the respective pattern. We measure the number of matchings per stream and the total CPU time spent on each stream. The matching implementation does not use any parallelization or sharing of partial matchings. Instead, it uses Prolog backtracking to generate all of the matchings. Since we utilize constant memory, processor time is the only resource expended.
    Fig. 14.
    Fig. 14. Evaluation of matching time per single matching with respect to the event stream size and pattern parameters.
    Figures 14 and 15 present the average results and standard deviations for each combination of pattern and its parameters. Unfortunately, in case of Pattern 1 some CPU times were so large that we introduced timeouts at 60 s. Thus, the CPU times in Figure 14, which include matchings which timed out, are likely to be larger in reality. The number of matchings in Figure 15 does not include matchings which timed out. There were no timeouts for Pattern 2. The distribution of timeouts with respect to \(p\) and \(\Delta\) is as follows: . Based on Figures 14 and 15, we can observe that
    Total time of matching for Pattern 2 scales practically linearly with the stream size, and seems to be independent of \(p\) and window size. This is to be expected since once we decide on the first matched event of the pattern, the rest of the matching is completely deterministic.
    In case of Pattern 1, it is hard to infer any trend since standard deviations are so large. In any case, while many random streams were matched very efficiently, some matching times were too large for this matching to be practical at scale.
    Number of matchings for Pattern 2 is significantly less than for Pattern 1. Also, this number seems to increase linearly with respect to the size of the stream, but the increase is slower for larger \(p\) ’s. This is understandable since the longer sequences of matching events are less probable (within the given time window) for random streams.
    For Pattern 1, reversing the dependence for Pattern 2, there are far more matchings for \(p=3\) than \(p=1\) . This is because of the exponential explosion of matchings when we do not limit the possibility of skipping relevant events.
    The tests and the raw experimental data are also available from [30].
    Fig. 15.
    Fig. 15. Number of solutions found with respect to event stream size and pattern parameters.

    9 Final Remarks

    We presented a new stream query language PatLang with two distinct formal denotational semantics: minimal and tree. Both the language and its semantics were inspired by [18, 19, 20]. We combine and modify the syntaxes from both articles to make them more user-friendly, and we also add some additional constructs such as skipping restrictions or explicit aggregation. Currently, PatLang does not include explicit selection strategies (cf. [18, 20]). Note, however, that skip-till-next match and strict selection strategy can be implemented by rewriting the pattern using appropriate skipping restrictions supported by PatLang. We designed two distinct denotational semantics to satisfy contradictory requirements: Minimal semantics is as simple as possible to describe when pattern matches, and what are the bindings of variables induced by this matching. Tree semantics provides detailed information about which parts of the stream match with each subpattern of the given pattern, i.e., information about interpretation of events in the event stream induced by the pattern matching. This tree semantics is unnecessary for verifying correctness of pattern matching execution. However, we demonstrate using practical examples that neither minimal semantics, nor a bit more complex semantics from the prior work is sufficient for effectively locating errors in certain patterns (with respect to their intended meaning), and that the additional information provided by the tree semantics is crucial for that purpose. Unfortunately, unlike minimal semantics, tree semantics couples tightly the interpretation of the pattern with the pattern itself, which makes it unsuitable for, say, defining the semantic equivalence between the patterns (this unsuitability was the primary rationale for designing the minimal semantics). Thus, we need both semantics, depending on applications, and neither is redundant. The two semantics are, however, not independent, and we prove that the tree semantics can be mapped to minimal semantics.
    Unlike in [18] we treat patterns of the form \(R\mathbin {\small {as}}x\) (as well as aggregations) as variable binders ([18] does not have variable binders: the meaning \(⟦ R\mathbin {\small {as}}x⟧ _{\nu , \Gamma }\) is defined relative to the substitution \(\nu\) which must assign a position in a stream \(\Gamma\) of matched event to \(x\) , i.e., in the pattern \(R\mathbin {\small {as}}x\) variable \(x\) is treated by semantics as free). This leads to rather exotic scoping and shadowing rules which we explore in Section 4. The author was pleased to discover that such “non-local” scoping, where scopes are not neatly nested as in the first-order logic, is not without precedent. Of particular relevance here is DPL [21] which has some applications in modelling natural language discourse: in DPL sentences are relations between variable substitutions which may introduce some new bindings. This is not unlike PatLang, where each pattern, given an initial substitution (and an event stream) generates a new variable substitution. We would like to explore possible connections between DPL and PatLang (and other stream query and pattern languages) in the future work.
    We are currently developing (cf. [25]) executable semantics for the language presented here. The executable semantics, based on term rewriting [5], is defined in such a way as to be equivalent to the tree semantics presented here. What is more interesting, a practical executable semantics should permit symbolic execution [8, 23] to explore all the possible outputs for the infinity of all possible input streams in a finite way. Then, the use of term rewriting is particularly fitting, since term rewriting comes with a natural framework for symbolic execution in the form of narrowing [13, 17] which was successfully used for verification (see e.g., [27]). The proof of concept pattern exploration tool based on these ideas (PatternExplorer) is already available [30] together with extensive documentation, but we defer the presentation of implementation details and proof of equivalence of tree and executable semantics to future work. That said, as a basic sanity check we presented here some evaluation using our tool, which shows that our language permits in many cases rewriting of patterns so that they avoid exponential explosion of computational cost and of the number of spurious matchings.

    Acknowledgments

    The author is grateful to the reviewers for their helpful remarks and suggestions.

    Footnotes

    1
    We are grateful to the reviewers for pointing out this possibility.
    2
    We thank the anonymous reviewer for pointing out that the implication in the Lemma cannot be extended to equivalence, and suggesting the above counterexample.

    A Proofs

    A.1 Proof of Theorem 6.7

    We prove the theorem by fairly straightforward induction on the structure of \(\phi\) . By Equations (14a), (12b), and (13), respectively, we have \(⟦ R\mathbin {\small {as}}x⟧ _{\Gamma ,\mu }^m :=\bigl \lbrace \lbrace [n]\rbrace \;|\;P^m_\Gamma (n)\bigr \rbrace\) and
    \(\begin{equation*} ⟦ R\mathbin {\small {as}}x⟧ ^{\small{G}, m}_{\Gamma , \mu } :=\bigl \lbrace (\lbrace n\rbrace , \lbrace x/\Gamma _n\rbrace)\;|\;P^m_\Gamma (n)\bigr \rbrace ,\quad \sigma ^{\lbrace [n]\rbrace , \Gamma , \mu }_{R\mathbin {\small {as}}x}:=\lbrace x/\Gamma _n\rbrace . \end{equation*}\)
    Above, we denoted for brevity \(P^m_\Gamma (n):= n\ge m\wedge \small{Type}(\Gamma _n,R)\) . Thus, it follows immediately that \(⟦ R\mathbin {\small {as}}x⟧ ^{\small{G}, m}_{\Gamma , \mu }=\Xi ^{\Gamma , \mu }_\phi \bigl (⟦ R\mathbin {\small {as}}x⟧ _{\Gamma ,\mu }^m\bigr)\) .
    Next, let us consider the case of \(\phi \mathbin {\small {filter}}P\) . By Equation (13) we have \(\sigma ^{D,\Gamma , \mu }_{\phi \mathbin {\small {filter}}P}=\sigma ^{D,\Gamma ,\mu }_\phi\) . Then by Equations (14b) and (12c) and by inductive assumption we have
    \(\begin{align*} \bigl \lbrace \Xi ^{\Gamma , \mu }_{\phi \mathbin {\small {filter}}P}(D)\;|\; D\in ⟦ \phi \mathbin {\small {filter}}P⟧ _{\Gamma ,\mu }^m\bigr \rbrace & = \bigl \lbrace (\lambda (D),\sigma ^{D,\Gamma ,\mu }_\phi)\;|\; D\in ⟦ \phi ⟧ _{\Gamma ,\mu }^m \wedge ⟦ P⟧ _{\mu \circ \sigma ^{D,\Gamma ,\mu }_\phi }\bigr \rbrace \\ & = \lbrace (X,\sigma)\in ⟦ \phi ⟧ _{\Gamma ,\mu }^{\small{G}, m}\;|\; ⟦ P⟧ _{\mu \circ \sigma }\rbrace = ⟦ \phi \mathbin {\small {filter}}P⟧ _{\Gamma ,\mu }^{\small{G}, m}. \end{align*}\)
    Let us now consider the case of \(\phi _1\mathbin {\small {or}}\phi _2\) . By Equation (13) we have
    \(\[ \sigma ^{[i]+D, \Gamma , \mu }_{\phi _1\mathbin {\small {or}}\phi _2} :=\sigma ^{D, \Gamma , \mu }_{\phi _i}\big |_{\mathfrak {B}_v(\phi _1)\cap \mathfrak {B}_v(\phi _2)}. \]\)
    Then it follows by inductive assumption and Equations (12d) and (14c) that
    \(\begin{align*} \bigl \lbrace \Xi ^{\Gamma , \mu }_{\phi _1\mathbin {\small {or}}\phi _2}(D)\;|\; D\in ⟦ \phi _1\mathbin {\small {or}}\phi _2⟧ _{\Gamma ,\mu }^m\bigr \rbrace & = \bigcup _{i\in \lbrace 1,2\rbrace }\bigl \lbrace (\lambda (D_i),\sigma ^{D_i, \Gamma , \mu }_{\phi _i}\big |_{\mathfrak {B}_v(\phi _1)\cap \mathfrak {B}_v(\phi _2)})\;|\;D_i\in ⟦ \phi _i⟧ _{\Gamma ,\mu }^m\bigr \rbrace \\ & = \bigcup _{i\in \lbrace 1,2\rbrace }\bigl \lbrace (X, \sigma \big |_{\mathfrak {B}_v(\phi _1)\cap \mathfrak {B}_v(\phi _2)})\;|\; (X,\sigma)\in ⟦ \phi _i⟧ _{\Gamma ,\mu }^{\small{G},m}\bigr \rbrace \\ & =⟦ \phi _1\mathbin {\small {or}}\phi _2⟧ _{\Gamma ,\mu }^{\small{G},m}. \end{align*}\)
    Next, let us consider the case of \(\phi _1\mathbin {;}\phi _2\) . By Equation (14e) we have:
    \(\begin{align*} \bigl \lbrace \Xi ^{\Gamma , \mu }_{\phi _1\mathbin {;}\phi _2}(D)\;|\; D & \in ⟦ \phi _1\mathbin {;}\phi _2⟧ _{\Gamma ,\mu }^m\bigr \rbrace = \bigl \lbrace \Xi ^{\Gamma , \mu }_\phi \bigl (([1]+D_1)\cup ([2]+D_2)\bigr)\;|\;D_1\\ & \in ⟦ \phi _1⟧ _{\Gamma , \mu }^m \wedge D_2\in ⟦ \phi _2⟧ _{\Gamma , \mu \circ \sigma ^{D_1, \Gamma , \mu }_{\phi _1}}^{\Lambda (D_1)+1}\bigr \rbrace =:(1). \end{align*}\)
    By Equation (13) we have \(\sigma ^{([1]+D_1)\cup ([2]+D_2), \Gamma , \mu }_{\phi _1\mathbin {;}\phi _2} :=\sigma ^{D_1, \Gamma , \mu }_{\phi _1}\circ \sigma ^{D_2, \Gamma , \mu \circ \sigma ^{D_1, \Gamma , \mu }_{\phi _1}}_{\phi _2}\) . Next,
    \(\[ \lambda \bigl (([1]+D_1)\cup ([2]+D_2)\bigr)=\lambda (D_1)\cup \lambda (D_2), \]\)
    hence
    \(\begin{align*} (1) & =\bigl \lbrace \bigl (\lambda (D_1)\cup \lambda (D_2), \sigma ^{D_1, \Gamma , \mu }_{\phi _1}\circ \sigma ^{D_2, \Gamma , \mu \circ \sigma ^{D_1, \Gamma , \mu }_{\phi _1}}_{\phi _2}\bigr) \;|\; D_1\in ⟦ \phi _1⟧ _{\Gamma , \mu }^m \wedge D_2\in ⟦ \phi _2⟧ _{\Gamma , \mu \circ \sigma ^{D_1, \Gamma , \mu }_{\phi _1}}^{\Lambda (D_1)+1}\bigr \rbrace \\ & =\bigl \lbrace (X_1\cup X_2, \sigma _1\circ \sigma _2)\;|\;(X_1, \sigma _1)\in \Xi ^{\Gamma , \mu }_{\phi _1}(⟦ \phi _1⟧ _{\Gamma , \mu }^m)\wedge (X_2, \sigma _2)\in \Xi ^{\Gamma , \mu \circ \sigma _1}_{\phi _2}(⟦ \phi _2⟧ _{\Gamma , \mu \circ \sigma _1}^{\max (X_1)+1}) =:(2) \end{align*}\)
    Now we can use our inductive assumption and Equation (12f) to compute
    \(\begin{equation*} (2)=\bigl \lbrace (X_1\cup X_2, \sigma _1\circ \sigma _2)\;|\;(X_1, \sigma _1)\in ⟦ \phi _1⟧ _{\Gamma , \mu }^{\small{G},m})\wedge (X_2, \sigma _2)\in ⟦ \phi _2⟧ _{\Gamma , \mu \circ \sigma _1}^{\small{G},\max (X_1)+1}) = ⟦ \phi _1\mathbin {;}\phi _2⟧ ^{\small{G}, m}_{\Gamma , \mu }. \end{equation*}\)
    The proof for the case of \(\phi _1\mathbin {\small {and}}\phi _2\) is similar to the proof for the case \(\phi _1\mathbin {;}\phi _2\) and we leave it to the reader. Consider now the case of \(\phi \mathbin {\small {noskip}}\psi\) . By Equation (13) we have \(\Xi ^{\Gamma , \mu }_{\phi \mathbin {\small {noskip}}\psi }(D)=(\Lambda (D), \sigma ^{D, \Gamma , \mu }_{\phi \mathbin {\small {noskip}}\psi })=(\Lambda (D), \sigma ^{D, \Gamma , \mu }_{\phi })=\Xi ^{\Gamma , \mu }_{\phi }(D)\) . Also, \(\textstyle \bigcup \pi _1(⟦ \psi ⟧ ^{\small{G}, m}_{\Gamma , \mu }) =\lambda \bigl (\textstyle \bigcup ⟦ \psi ⟧ _{\Gamma , \mu }^m\bigr)\) since \(\psi\) matches only single events. Then, by Equations (12g) and (14f), and our inductive assumption we get
    \(\begin{align*} \bigl \lbrace \Xi ^{\Gamma , \mu }_{\phi \mathbin {\small {noskip}}\psi }(D)\;|\; D & \in ⟦ \phi \mathbin {\small {noskip}}\psi ⟧ _{\Gamma ,\mu }^m\bigr \rbrace \\ & =\bigl \lbrace \Xi ^{\Gamma , \mu }_{\phi }(D)\;|\; D\in ⟦ \phi ⟧ _{\Gamma , \mu }^m\wedge [m, \max (\lambda (D)))\cap \lambda \bigl (\textstyle \bigcup ⟦ \psi ⟧ _{\Gamma , \mu }^m\bigr)\subseteq \lambda (D)\bigr \rbrace \\ & =\bigl \lbrace (X, \sigma)\in ⟦ \phi ⟧ _{\Gamma , \mu }^{\small{G},m}\;|\; [m, \max (X))\cap \textstyle \bigcup \pi _1(⟦ \psi ⟧ ^{\small{G}, m}_{\Gamma , \mu })\subseteq X\bigr \rbrace =⟦ \phi \mathbin {\small {noskip}}\psi ⟧ ^{\small{G},m}_{\Gamma , \mu }. \end{align*}\)
    Now let us consider the final case of \(\phi {+}\small {agr}(a_1:f_1(E_1),\ldots)\mathbin {\small {as}}x\) (the case of \(\phi {+}\) is similar and left to the reader). By Equation (13) we have \(\sigma ^{\bigcup _i([i]+D_i), \Gamma , \mu }_{{\phi +}\small {agr}(a_0:f_0(E_0),\ldots)\mathbin {\small {as}}x} :=\lbrace x/e^{\Gamma ,\mu }_{\sigma ^{D_0,\Gamma , \mu }_\phi ,\sigma ^{D_1,\Gamma , \mu }_\phi , \ldots }\rbrace\) , where
    \(\begin{equation*} e^{\Gamma ,\mu }_{\sigma _0,\sigma _1,\ldots ,\sigma _n} :=\small {agr}\bigl (⟦ f_0⟧ ([⟦ E_0⟧ _{\mu \circ \sigma _0}, ⟦ E_0⟧ _{\mu \circ \sigma _1},\ldots ]),\ldots , ⟦ f_n⟧ ([\ldots ]), \Gamma _{\Lambda (D_n)}\mathbin {.}\tau \bigr) \end{equation*}\)
    Then, by inductive assumption and Equations (14g) and (12i) we have that
    \(\begin{align*} & \bigl \lbrace \Xi ^{\Gamma , \mu }_{\phi {+}\small {agr}(a_1:f_1(E_1),\ldots)\mathbin {\small {as}}x}(D)\;|\; D\in ⟦ \phi {+}\small {agr}(a_0:f_0(E_0),\ldots)\mathbin {\small {as}}x⟧ _{\Gamma ,\mu }^m\bigr \rbrace \\ & =\bigcup _{n=0}^\infty \bigl \lbrace \bigl (\bigcup _i\lambda (D_i),\lbrace x/e^{\Gamma ,\mu }_{\sigma ^{D_0,\Gamma , \mu }_\phi ,\sigma ^{D_1,\Gamma , \mu }_\phi , \ldots }\rbrace \bigr)\;\big |\;D_0\in ⟦ \phi ⟧ _{\Gamma , \mu }^{m}\wedge D_1\in ⟦ \phi ⟧ _{\Gamma , \mu }^{\Lambda (D_0)+1}\wedge \cdots \wedge D_n\in ⟦ \phi ⟧ _{\Gamma , \mu }^{\Lambda (D_{n-1})+1}\bigr \rbrace \\ & =\bigcup _{n=0}^\infty \bigl \lbrace \bigl (\bigcup _iX_i,\lbrace x/e^{\Gamma ,\mu }_{\sigma _0,\sigma _1, \ldots }\rbrace \bigr)\;\big |\;(X_0,\sigma _0)\in ⟦ \phi ⟧ _{\Gamma , \mu }^{\small{G},m}\wedge (X_1,\sigma _1)\in ⟦ \phi ⟧ _{\Gamma , \mu }^{\small{G},\max (X_0)+1}\wedge \cdots \bigr \rbrace \\ & =⟦ \phi {+}\small {agr}(a_0:f_0(E_0),\ldots)\mathbin {\small {as}}x⟧ _{\Gamma ,\mu }^{\small{G},m}. \end{align*}\)

    References

    [1]
    Darko Anicic, Paul Fodor, Sebastian Rudolph, Roland Stühmer, Nenad Stojanovic, and Rudi Studer. 2011. ETALIS: Rule-Based Reasoning in Event Processing. Springer, Berlin,99–124. DOI:
    [2]
    Alexander Artikis, Alessandro Margara, Martin Ugarte, Stijn Vansummeren, and Matthias Weidlich. 2017. Complex event recognition languages: Tutorial. In Proceedings of the 11th ACM International Conference on Distributed and Event-Based Systems (DEBS’17). Association for Computing Machinery, 7–10. DOI:
    [3]
    Alexander Artikis, Marek Sergot, and Georgios Paliouras. 2014. An event calculus for event recognition. IEEE Transactions on Knowledge and Data Engineering 27, 4 (2014), 895–908.
    [4]
    Alexander Artikis, Anastasios Skarlatidis, François Portet, and Georgios Paliouras. 2012. Logic-based event recognition. The Knowledge Engineering Review 27, 4 (2012), 469–506. DOI:
    [5]
    Franz Baader and Tobias Nipkow. 1999. Term Rewriting and All That. Cambridge University Press, Cambridge, UK.
    [6]
    Lars Brenna, Alan Demers, Johannes Gehrke, Mingsheng Hong, Joel Ossher, Biswanath Panda, Mirek Riedewald, Mohit Thatte, and Walker White. 2007. Cayuga: A high-performance event processing engine. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (SIGMOD’07). Association for Computing Machinery, 1100–1102. DOI:
    [7]
    Loli Burgueño, Juan Boubeta-Puig, and Antonio Vallecillo. 2018. Formalizing complex event processing systems in Maude. IEEE Access 6 (2018), 23222–23241.
    [8]
    Cristian Cadar and Koushik Sen. 2013. Symbolic execution for software testing: Three decades later. Communications of the ACM 56, 2 (2013), 82–90.
    [9]
    Gianpaolo Cugola and Alessandro Margara. 2010. TESLA: A formally defined event specification language. In Proceedings of the 4th ACM International Conference on Distributed Event-Based Systems (DEBS’10). Association for Computing Machinery, 50–61. DOI:
    [10]
    Gianpaolo Cugola and Alessandro Margara. 2012. Processing flows of information: From data stream to complex event processing. ACM Computing Surveys (CSUR) 44, 3 (2012), 1–62.
    [11]
    Alan J. Demers, Johannes Gehrke, Biswanath Panda, Mirek Riedewald, Varun Sharma, and Walker M. White. 2007. Cayuga: A general purpose event monitoring system. In Proceedings of the Cidr. 412–422.
    [12]
    Christophe Dousson and Pierre Le Maigat. 2007. Chronicle recognition improvement using temporal focusing and hierarchization. In Proceedings of the IJCAI. 324–329.
    [13]
    Francisco Durán, Steven Eker, Santiago Escobar, Narciso Martí-Oliet, José Meseguer, Rubén Rubio, and Carolyn Talcott. 2020. Programming and symbolic computation in Maude. Journal of Logical and Algebraic Methods in Programming 110, special issue (2020), 100497.
    [14]
    Françoise Fabret, H. Arno Jacobsen, François Llirbat, Joăo Pereira, Kenneth A. Ross, and Dennis Shasha. 2001. Filtering algorithms and implementation for very fast publish/subscribe syste. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (SIGMOD’01). Association for Computing Machinery, 115–126. DOI:
    [15]
    Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. 2018. Cypher: An evolving query language for property graphs. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD’18). Association for Computing Machinery, 1433–1445. DOI:
    [16]
    Antony Galton and Juan Carlos Augusto. 2002. Two approaches to event definition. In Proceedings of the Database and Expert Systems Applications. Abdelkader Hameurlain, Rosine Cicchetti, and Roland Traunmüller (Eds.), Springer, Berlin,547–556.
    [17]
    E. Giovannetti and C. Moiso. 1988. A completeness result for E-unification algorithms based on conditional narrowing. In Proceedings of the Foundations of Logic and Functional Programming. Mauro Boscarol, Luigia Carlucci Aiello, and Giorgio Levi (Eds.), Springer, Berlin,157–167.
    [18]
    Alejandro Grez, Cristian Riveros, and Martín Ugarte. 2019. A formal framework for complex event processing. In Proceedings of the 22nd International Conference on Database Theory (ICDT’19).Pablo Barcelo and Marco Calautti (Eds.), Leibniz International Proceedings in Informatics (LIPIcs), Vol. 127, Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 5:1–5:18. DOI:
    [19]
    Alejandro Grez, Cristian Riveros, Martín Ugarte, and Stijn Vansummeren. 2020. On the expressiveness of languages for complex event recognition. In Proceedings of the 23rd International Conference on Database Theory (ICDT’20).Carsten Lutz and Jean Christoph Jung (Eds.), Leibniz International Proceedings in Informatics (LIPIcs), Vol. 155, Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 15:1–15:17. DOI:
    [20]
    Alejandro Grez, Cristian Riveros, Martín Ugarte, and Stijn Vansummeren. 2021. A formal framework for complex event recognition. ACM Transactions on Database Systems 46, 4 (2021), 1–49.
    [21]
    Jeroen Groenendijk and Martin Stokhof. 1991. Dynamic predicate logic. Linguistics and Philosophy 14, 1 (1991), 39–100. DOI:
    [22]
    Steffen Hölldobler and Josef Schneeberger. 1990. A new deductive approach to planning. New Generation Computing 8, 3 (1990), 225–244.
    [23]
    James C. King. 1976. Symbolic execution and program testing. Communications of the ACM 19, 7 (1976), 385–394.
    [24]
    David C. Luckham. 2011. Event Processing for Business: Organizing the Real-time Enterprise. John Wiley and Sons.
    [25]
    Paweł Maślanka and Bartosz Zieliński. 2021. Poster: Testing complex event patterns. In Proceedings of the 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST’21). 474–477. DOI:
    [26]
    Yuan Mei and Samuel Madden. 2009. ZStream: A cost-based query processor for adaptively detecting composite events. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD’09). Association for Computing Machinery, 193–206. DOI:
    [27]
    José Meseguer and Prasanna Thati. 2007. Symbolic reachability analysis using narrowing and its application to verification of cryptographic protocols. Higher-Order and Symbolic Computation 20, 1-2 (2007), 123–160.
    [28]
    Eugene Wu, Yanlei Diao, and Shariq Rizvi. 2006. High-performance complex event processing over streams. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD’06). Association for Computing Machinery, 407–418. DOI:
    [29]
    Haopeng Zhang, Yanlei Diao, and Neil Immerman. 2014. On complexity and optimization of expensive queries in complex event processing. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD’14). Association for Computing Machinery, 217–228. DOI:
    [30]
    Bartosz Zieliński. 2022. PatternExplorer. Retrieved 22 July 2023 from https://github.com/BartoszPZielinski/PatternExplorer

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Formal Aspects of Computing
    Formal Aspects of Computing  Volume 35, Issue 4
    December 2023
    214 pages
    ISSN:0934-5043
    EISSN:1433-299X
    DOI:10.1145/3633307
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 November 2023
    Online AM: 10 July 2023
    Accepted: 20 June 2023
    Revised: 25 April 2023
    Received: 11 April 2022
    Published in FAC Volume 35, Issue 4

    Check for updates

    Author Tags

    1. Event processing
    2. denotational semantics
    3. query language
    4. variable binding and scope

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 303
      Total Downloads
    • Downloads (Last 12 months)303
    • Downloads (Last 6 weeks)31

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media