Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
\newtheoremrep

theoremTheorem[section] \newtheoremreplemma[theorem]Lemma \newtheoremrepproposition[theorem]Proposition

Transforming Property Graphs (Extended Version)111A shorter version of this paper has been accepted for publication in VLDB 2024.

Angela Bonifati Lyon 1 Univ., Liris CNRS & IUF angela.bonifati@univ-lyon1.fr Filip Murlak Univ. of Warsaw fmurlak@mimuw.edu.pl  and  Yann Ramusat 0000-0001-5109-3700 Lyon 1 Univ., Liris CNRS yann.ramusat@liris.cnrs.fr
Abstract.

In this paper, we study a declarative framework for specifying transformations of property graphs. In order to express such transformations, we leverage queries formulated in the Graph Pattern Calculus (GPC), which is an abstraction of the common core of recent standard graph query languages, GQL and SQL/PGQ. In contrast to previous frameworks targeting graph topology only, we focus on the impact of data values on the transformations—which is crucial in addressing users’ needs. In particular, we study the complexity of checking if the transformation rules do not specify conflicting values for properties, and we show this is closely related to the satisfiability problem for GPC. We prove that both problems are PSpace-complete.

We have implemented our framework in openCypher. We show the flexibility and usability of our framework by leveraging an existing data integration benchmark, adapting it to our needs. We also evaluate the incurred overhead of detecting potential inconsistencies at run-time, and the impact of several optimization tools in a Cypher-based graph database, by providing a comprehensive comparison of different implementation variants. The results of our experimental study show that our framework exhibits large practical benefits for transforming property graphs compared to ad-hoc transformation scripts.

PVLDB Reference Format: PVLDB, 14(1): XXX-XXX, 2020.
doi:XX.XX/XXX.XX

PVLDB Artifact Availability:
The source code, data, and/or other artifacts have been made available at %leave␣empty␣if␣no␣availability␣url␣should␣be␣sethttps://github.com/yannramusat/TPG.

1. Introduction

Query languages for property graphs—those supported by existing systems, such as Neo4j’s openCypher (francis_cypher_2018) or Oracle’s PGQL (10.1145/2960414.2960421), and those codified in international standards, such as GQL and SQL/PGQ (francis_researchers_2023)—define their semantics in terms of sets of tuples. This is inadequate for data interoperability tasks such as data migration or data integration, where outputs of some queries are to be fed directly to other queries. To support this kind of composability, queries should be able to output property graphs rather than sets of tuples. Such queries can be seen as transformations, turning an input property graph into an output property graph.

Interoperability of graph data has received little attention so far, compared to the relational and XML data models (10.5555/1941440). Notable research in the area (10.1145/2448496.2448520; 10.1145/3584372.3588654) relies on the simplified graph data model that had been devised to provide the foundations for querying the topology of graphs with formalisms such as conjunctive regular path queries (CRPQs) (10.1145/2463664.2465216) or regular queries (vardi_theory_2016). As the simplified graph data model ignores the presence of properties (key-value pairs stored in nodes and edges), it is too far from the property graph models used in graph databases such as Neo4j or Tigergraph, and cannot be a foundation for practical solutions. These currently rely on opaque external libraries, such as Neo4j’s APOC (apoc), or involve complex handcrafted queries (graphacademy), as illustrated below.

Refer to caption
i Input property graph G𝐺Gitalic_G containing ingested relational data.
{minted}

[xleftmargin=2em, linenos=true, fontsize=, escapeinside=!!]cypher MATCH (u:User) MATCH (a:Address) WHERE a.aid = u.address MATCH (l:Location) WHERE l.aid = u.address WITH u, collect(a) AS Addresses, collect(l) AS Locations CREATE (p:Person) !! SET p.name = u.name WITH p, Addresses, Locations UNWIND Addresses AS a MERGE (ci:City name: a.cityName) !! SET ci.code = a.cityCode MERGE (p)-[:HasAddress]-¿(ci) WITH p, Locations UNWIND Locations AS l MERGE (co:Country name: l.countryName) !! SET co.code = l.countryCode MERGE (p)-[:HasLocation]-¿(co)

ii Ad-hoc transformation script in openCypher.

Refer to caption
iii Resulting output property graph H𝐻Hitalic_H.
Figure 1. Ad-hoc transformation of raw ingested data.
Example 1.1.

Figure 1 illustrates a graph transformation scenario, in which a user has imported relational data into the popular Neo4j graph database and would like to reshape it into a semantically meaningful property graph instance, to facilitate navigational querying. The relational data consists of three tables,

𝖴𝗌𝖾𝗋(name¯,address¯),𝖠𝖽𝖽𝗋𝖾𝗌𝗌(aid¯,cityName,cityCode),𝖴𝗌𝖾𝗋¯𝑛𝑎𝑚𝑒¯𝑎𝑑𝑑𝑟𝑒𝑠𝑠𝖠𝖽𝖽𝗋𝖾𝗌𝗌¯𝑎𝑖𝑑𝑐𝑖𝑡𝑦𝑁𝑎𝑚𝑒𝑐𝑖𝑡𝑦𝐶𝑜𝑑𝑒\displaystyle\mathsf{User}(\underline{name},\underline{address})\,,\;\mathsf{% Address}(\underline{aid},cityName,cityCode)\,,sansserif_User ( under¯ start_ARG italic_n italic_a italic_m italic_e end_ARG , under¯ start_ARG italic_a italic_d italic_d italic_r italic_e italic_s italic_s end_ARG ) , sansserif_Address ( under¯ start_ARG italic_a italic_i italic_d end_ARG , italic_c italic_i italic_t italic_y italic_N italic_a italic_m italic_e , italic_c italic_i italic_t italic_y italic_C italic_o italic_d italic_e ) ,
𝖫𝗈𝖼𝖺𝗍𝗂𝗈𝗇(aid¯,countryName,countryCode),𝖫𝗈𝖼𝖺𝗍𝗂𝗈𝗇¯𝑎𝑖𝑑𝑐𝑜𝑢𝑛𝑡𝑟𝑦𝑁𝑎𝑚𝑒𝑐𝑜𝑢𝑛𝑡𝑟𝑦𝐶𝑜𝑑𝑒\displaystyle\mathsf{Location}(\underline{aid},countryName,countryCode)\,,sansserif_Location ( under¯ start_ARG italic_a italic_i italic_d end_ARG , italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_N italic_a italic_m italic_e , italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_C italic_o italic_d italic_e ) ,

with primary keys consisting of the underlined attributes, and having two foreign keys: aid𝑎𝑖𝑑aiditalic_a italic_i italic_d references address𝑎𝑑𝑑𝑟𝑒𝑠𝑠addressitalic_a italic_d italic_d italic_r italic_e italic_s italic_s in 𝖴𝗌𝖾𝗋𝖴𝗌𝖾𝗋\mathsf{User}sansserif_User from both 𝖠𝖽𝖽𝗋𝖾𝗌𝗌𝖠𝖽𝖽𝗋𝖾𝗌𝗌\mathsf{Address}sansserif_Address and 𝖫𝗈𝖼𝖺𝗍𝗂𝗈𝗇𝖫𝗈𝖼𝖺𝗍𝗂𝗈𝗇\mathsf{Location}sansserif_Location.

Figure 1 (1i) shows a rudimentary property graph obtained after importing the relational data, using a generic ingestion method, such as Cypher’s \mintinlinecypher—LOAD CSV— clause. In the resulting property graph, each node represents a single tuple of the relational instance, with the relation’s name represented as the label, and the attributes stored in the node’s properties. Note that there are no edges in this property graph: relationships between places, locations, and users are represented by way of foreign keys, just like in the original relational instance. Needless to say, this is not the best way to represent a relational instance as a property graph.

The user now wants to transform the instance in Figure 1 (1i) into one that makes better use of the property graph data model by facilitating navigational operations in queries like “Which people live in the same city as Jean?”. The user intends to create a node for each person, city, and country, and replace foreign key references with explicit relationships. Figure 1 (1ii) shows an implementation of this transformation in openCypher that closely follows a graph refactoring solution described in Neo4j’s GraphAcademy (graphacademy). The reader will notice how difficult it is to relate the constructs of this query to the informal specification above. Even just making sense of the \mintinlinecypher—MERGE— clauses interleaved with implicit grouping and list manipulations (\mintinlinecypher—UNWIND— and \mintinlinecypher—collect—) is a daunting task for an unacquainted user. But the query leverages other advanced idioms too. For instance, in Line 1ii, the script creates as many nodes of type \mintinlinecypher—Person— as there are rows output by the previous \mintinlinecypher—WITH— clause: one for each u𝑢uitalic_u, due to implicit grouping. In line 1ii, the script generates one \mintinlinecypher—City— node for each distinct value found in property \mintinlinecypher—cityName— across all a𝑎aitalic_a’s; this is because the property \mintinlinecypher—name— is specified as \mintinlinecypher—a.cityName— in the \mintinlinecypher—MERGE— clause. Similarly, in line 1ii, a single \mintinlinecypher—Country— node is created for each distinct value found in property \mintinlinecypher—countryName—.

Figure 1 (1iii) shows the output property graph obtained by running the script on the input property graph from Figure 1 (1i). It reveals that the ad-hoc transformation fails to account for the fact that cities are weak entities that cannot be identified by their name alone, and conflates Luxemburg in Europe with Luxemburg in the US. Detecting such errors is hard because openCypher lacks a transparent mechanism for specifying identities of created elements. \blacktriangleleft

As we have seen, ad-hoc transformation scripts are error-prone and hard to interpret and analyze. Moving away from handcrafted implementations to declarative specifications has long been recognized as pivotal for solving data programmability problems (bernstein_model_2007). The aim of this work is to lay the theoretical foundations for the declarative specification of property graph transformations, and facilitate practical solutions for turning such specifications into executable scripts in modern property graph query languages. Constraint-based, fully declarative formalisms, such as schema mappings for relational (fagin_data_2005; 10.1145/1061318.1061323; bellahsene2011schema) and graph (10.1145/2448496.2448520; 10.1145/3034786.3056113) data, allow multiple target solutions, leading to ambiguous transformations  (10.5555/1182635.1164136; DBLP:conf/sigmod/BonifatiCCT17). For property graphs, this makes the schema mapping problem undecidable even under strong restrictions (10.1145/3034786.3056113). We avoid this problem by focusing on transformations that return a unique, well-defined output instance for each input instance, thus facilitating direct execution.

We propose a rule-based formalism that allows the user to describe the output property graph based on the input property graph, by specifying not only labels, properties, and relationships between output elements, but also their identities. The formalism builds upon the Graph Pattern Calculus (GPC) (10.1145/3584372.3588662), which is an abstraction of the common graph pattern matching features of GQL and SQL/PGQ (DBLP:conf/sigmod/DeutschFGHLLLMM22). GPC is adequate in terms of expressive power: it has ample facilities for querying properties and even on property-less graphs it goes well beyond classical formalisms such as RPQs and CRPQs. It is suitable for theoretical investigation owing to its concise syntax and rigorous semantics. It should also keep our proposal future-proof by ensuring out-of-the-box compatibility with the expected implementations of these standards. Until then, we can rely on the already implemented graph query languages, such as Neo4j’s openCypher (francis_cypher_2018) or Oracle’s PGQL (10.1145/2960414.2960421), which were a strong inspiration for GQL. Indeed, the actual query language used in the rules can be seen as a parameter of the framework.

In contrast to the purely topological formalism of (10.1145/3584372.3588654), specifications of property-aware transformations can easily become inconsistent, when they attempt to specify two different values for the same property of a given element. Detecting such conflicts naturally comes to the foreground of static analysis. As we show, this problem is tightly connected to the satisfiability problem for GPC+ (GPC extended with union and projection, also introduced in (10.1145/3584372.3588662)), which is to decide if there is a property graph satisfying a given GPC+ query. Exploiting this connection, we establish tight complexity bounds for both these problems, showing that they are PSpace-complete. To the best of our knowledge, this is the first static-analysis result on GPC. Given that query satisfiability is the work horse of static analysis throughout database theory, we believe that with the adoption of the GQL standard our result will find other uses. An immediate consequence for property graph transformations is that consistency cannot be checked statically due to the prohibitive cost, and conflicts must be handled dynamically, during the execution of the transformation.

In order to prove that our formalism can serve as a foundation for practical data interoperability solutions, we provide a proof-of-concept implementation. As no existing query engine supports GQL yet, we rely on the Neo4j’s open-source implementation of openCypher (francis_cypher_2018; green_updating_2019), which offers most of the functionalities of GQL described in (francis_researchers_2023). We study the case when the rules are provided by the users and describe a generic, easily automated method of translating these rules into executable openCypher scripts, and apply it manually to selected realistic property graph transformations derived from real-world data integration scenarios of the iBench benchmark suite (arocena_ibench_2015). We perform a comprehensive experimental study gauging the efficiency of conflict detection and the effect of rule order and various optimizations on several implementation variants. We confirm that our implementation performs well in all scenarios and scales to large input data. We also demonstrate that our framework can be successfully applied in a concrete data integration scenario on real-world data (ICIJ-github), and report the results of a small-scale user study confirming that our framework enhances readability and usability of transformations.

In summary, our main contributions are the following.

  • We propose a comprehensive declarative formalism for specifying transformations of property graphs, compatible with SQL/PGQ and the upcoming GQL standard.

  • We identify consistency as a key static-analysis problem, and show that it is interreducible with satisfiability of GPC+ queries and that both are PSpace-complete.

  • We provide a proof-of-concept implementation of our formalism in openCypher, and apply it to realistic scenarios of graph-shaped data transformations.

  • We show experimentally that our solution scales to large input data, handles on-the-fly conflict detection with low overhead, and enhances readability and usability, without sacrificing preformance.

The rest of paper is organized as follows. In Section 2, we recall the property graph data model along with GPC. In Section 4, we give syntax and semantics of our graph transformation formalism. In Section 6, we discuss the consistency in relation with satisfiability of GPC+ queries, and establish the complexity bounds. In Section 8, we describe our-proof-of-concept implementation. In Section 9, we present both the experiments and the user study. In Section 11 and in Section 12, we discuss the related work and conclude the paper.

2. Preliminaries

We briefly introduce the basic concepts of the property graph data model and the Graph Pattern Calculus (GPC) that we use in this paper. We mostly follow the definitions from (10.1145/3584372.3588662).

{toappendix}

3. Preliminaries

3.1. Data model

{toappendix}

3.2. Data model

We introduce thereafter further basic notation around the data model of a property graph.

A path p(n0,e1,n1,,ek,nk)𝑝subscript𝑛0subscript𝑒1subscript𝑛1subscript𝑒𝑘subscript𝑛𝑘p\coloneqq(n_{0},e_{1},n_{1},\dots,e_{k},n_{k})italic_p ≔ ( italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) is an alternating sequence of node and edges, which starts and ends with nodes. Given a path p𝑝pitalic_p we denote by 𝗌𝗋𝖼(p)𝗌𝗋𝖼𝑝\mathsf{src}(p)sansserif_src ( italic_p ) and 𝗍𝗀𝗍(p)𝗍𝗀𝗍𝑝\mathsf{tgt}(p)sansserif_tgt ( italic_p ) the first and last node of p𝑝pitalic_p; in this case, 𝗌𝗋𝖼(p)=n0𝗌𝗋𝖼𝑝subscript𝑛0\mathsf{src}(p)=n_{0}sansserif_src ( italic_p ) = italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝗍𝗀𝗍(p)=nk𝗍𝗀𝗍𝑝subscript𝑛𝑘\mathsf{tgt}(p)=n_{k}sansserif_tgt ( italic_p ) = italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. 𝗅𝖾𝗇(p)𝗅𝖾𝗇𝑝\mathsf{len}(p)\in\mathbb{N}sansserif_len ( italic_p ) ∈ blackboard_N is the length of p𝑝pitalic_p, i.e, the number of edges in the path; and if 𝗅𝖾𝗇(p)=0𝗅𝖾𝗇𝑝0\mathsf{len}(p)=0sansserif_len ( italic_p ) = 0 then the path consists of a single node which is both the source and the target. We can define as usual the concatenation pp𝑝superscript𝑝p\cdot p^{\prime}italic_p ⋅ italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of two paths p𝑝pitalic_p and psuperscript𝑝p^{\prime}italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT whenever 𝗍𝗀𝗍(p)=𝗌𝗋𝖼(p)𝗍𝗀𝗍𝑝𝗌𝗋𝖼superscript𝑝\mathsf{tgt}(p)=\mathsf{src}(p^{\prime})sansserif_tgt ( italic_p ) = sansserif_src ( italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ).

Conforming to the formal specification originating from (10.1145/3584372.3588662), a property graph G𝐺Gitalic_G is a tuple N,E,λ,𝗌𝗋𝖼,𝗍𝗀𝗍,δ𝑁𝐸𝜆𝗌𝗋𝖼𝗍𝗀𝗍𝛿\langle N,E,\lambda,\mathsf{src},\mathsf{tgt},\delta\rangle⟨ italic_N , italic_E , italic_λ , sansserif_src , sansserif_tgt , italic_δ ⟩ where 𝒪𝒪\mathcal{O}caligraphic_O, \mathcal{L}caligraphic_L, 𝒦𝒦\mathcal{K}caligraphic_K and 𝖢𝗈𝗇𝗌𝗍𝖢𝗈𝗇𝗌𝗍\mathsf{Const}sansserif_Const are disjoint countable sets of object identifiers (ids), labels, keys (also called property names) and constants (data values), and

  • N𝒪𝑁𝒪N\subset\mathcal{O}italic_N ⊂ caligraphic_O is the finite set of node ids in G𝐺Gitalic_G;

  • E𝒪𝐸𝒪E\subset\mathcal{O}italic_E ⊂ caligraphic_O is the finite set of edge ids;

  • N𝑁Nitalic_N and E𝐸Eitalic_E are disjoint;

  • λ:NE2:𝜆𝑁𝐸superscript2\lambda:N\cup E\to 2^{\mathcal{L}}italic_λ : italic_N ∪ italic_E → 2 start_POSTSUPERSCRIPT caligraphic_L end_POSTSUPERSCRIPT is a labeling function that associates to every id a (possibly empty) finite set of labels;

  • 𝗌𝗋𝖼,𝗍𝗀𝗍:EN:𝗌𝗋𝖼𝗍𝗀𝗍𝐸𝑁\mathsf{src},\mathsf{tgt}:E\to Nsansserif_src , sansserif_tgt : italic_E → italic_N define the source and target of each edge;

  • δ:(NE)×𝒦𝖢𝗈𝗇𝗌𝗍:𝛿𝑁𝐸𝒦𝖢𝗈𝗇𝗌𝗍\delta:(N\cup E)\times\mathcal{K}\to\mathsf{Const}italic_δ : ( italic_N ∪ italic_E ) × caligraphic_K → sansserif_Const is a finite-domain partial function that associates a constant with an id and a key from 𝒦𝒦\mathcal{K}caligraphic_K.

The node ids and edge ids will be respectively called the nodes and edges of the property graph.

That is to say, a property graph is a multigraph in the sense that two vertices may be connected by more than one edge, even with these edges having the same label(s), and that loops are permitted. All the elements of the database (the nodes and the edges) store a finite set of property-value pairs, represented by δ𝛿\deltaitalic_δ.

A property graph is presented in Figure 1 (1iii). It contains information about peoples’ location such as the 𝖢𝗂𝗍𝗒𝖢𝗂𝗍𝗒\mathsf{City}sansserif_City and the 𝖢𝗈𝗎𝗇𝗍𝗋𝗒𝖢𝗈𝗎𝗇𝗍𝗋𝗒\mathsf{Country}sansserif_Country they live in. We see that it contains one node with label 𝖢𝗂𝗍𝗒𝖢𝗂𝗍𝗒\mathsf{City}sansserif_City and two nodes with label 𝖢𝗈𝗎𝗇𝗍𝗋𝗒𝖢𝗈𝗎𝗇𝗍𝗋𝗒\mathsf{Country}sansserif_Country; two edges with label 𝖧𝖺𝗌𝖫𝗈𝖼𝖺𝗍𝗂𝗈𝗇𝖧𝖺𝗌𝖫𝗈𝖼𝖺𝗍𝗂𝗈𝗇\mathsf{HasLocation}sansserif_HasLocation and two edges with label 𝖧𝖺𝗌𝖠𝖽𝖽𝗋𝖾𝗌𝗌𝖧𝖺𝗌𝖠𝖽𝖽𝗋𝖾𝗌𝗌\mathsf{HasAddress}sansserif_HasAddress; all nodes have property name𝑛𝑎𝑚𝑒nameitalic_n italic_a italic_m italic_e; and all nodes with label 𝖢𝗂𝗍𝗒𝖢𝗂𝗍𝗒\mathsf{City}sansserif_City or 𝖢𝗈𝗎𝗇𝗍𝗋𝗒𝖢𝗈𝗎𝗇𝗍𝗋𝗒\mathsf{Country}sansserif_Country have an additional property code𝑐𝑜𝑑𝑒codeitalic_c italic_o italic_d italic_e. Annotations 𝗉𝟣,𝗉𝟤,,𝖼𝗈𝟤𝗉𝟣𝗉𝟤𝖼𝗈𝟤\mathsf{p1},\mathsf{p2},\dots,\mathsf{co2}sansserif_p1 , sansserif_p2 , … , sansserif_co2 are node identifiers; edge identifiers are not shown.

3.3. Graph Pattern Calculus

{toappendix}

3.4. Graph Pattern Calculus

We make a brief summary on the syntax and semantics of GPC (10.1145/3584372.3588662), focusing only on the concepts we need to formally define our property graph transformation rules.

The atomic GPC patterns are node and edge patterns. A node pattern is of the form (x:):𝑥\left(x:\ell\right)( italic_x : roman_ℓ ) and an edge pattern is of the form x::𝑥\scriptstyle\mathit{x}:\ellitalic_x : roman_ℓ . In both cases x𝑥xitalic_x is an optional variable (picked from a countably infinite set 𝒳𝒳\mathcal{X}caligraphic_X of variables) which bounds, if present, to the matched element and \ellroman_ℓ is an optional label indicating that we want to restrict to \ellroman_ℓ-elements. In an edge pattern may indicate one of the two possible directions: forward and backward . A GPC pattern denoted π𝜋\piitalic_π is inductively constructed on top of the atomic patterns by using arbitrarily many union (π+π𝜋𝜋\pi+\piitalic_π + italic_π), concatenation (ππ𝜋𝜋\pi\cdot\piitalic_π ⋅ italic_π), conditioning (πθsubscript𝜋delimited-⟨⟩𝜃\pi_{\langle\theta\rangle}italic_π start_POSTSUBSCRIPT ⟨ italic_θ ⟩ end_POSTSUBSCRIPT), and repetition (πn..m\pi^{n..m}italic_π start_POSTSUPERSCRIPT italic_n . . italic_m end_POSTSUPERSCRIPT) constructs.

A GPC query is of the form ρπ𝜌𝜋\rho\,\piitalic_ρ italic_π with ρ𝜌\rhoitalic_ρ a restrictor among the set of 𝗌𝗂𝗆𝗉𝗅𝖾𝗌𝗂𝗆𝗉𝗅𝖾\mathsf{simple}sansserif_simple, 𝗍𝗋𝖺𝗂𝗅𝗍𝗋𝖺𝗂𝗅\mathsf{trail}sansserif_trail, 𝗌𝗁𝗈𝗋𝗍𝖾𝗌𝗍𝗌𝗁𝗈𝗋𝗍𝖾𝗌𝗍\mathsf{shortest}sansserif_shortest, which purpose is to ensure a finite result set. 𝗌𝗂𝗆𝗉𝗅𝖾𝗌𝗂𝗆𝗉𝗅𝖾\mathsf{simple}sansserif_simple prevents repetition of nodes along a path, 𝗍𝗋𝖺𝗂𝗅𝗍𝗋𝖺𝗂𝗅\mathsf{trail}sansserif_trail prevents repetition of edges and 𝗌𝗁𝗈𝗋𝗍𝖾𝗌𝗍𝗌𝗁𝗈𝗋𝗍𝖾𝗌𝗍\mathsf{shortest}sansserif_shortest selects only the paths of minimal length among all the paths between two nodes.

The structure of a GPC query can be inspected using a type system, a set of typing rules (10.1145/3584372.3588662). A query is well-typed if the rules permit to deduce a unique type to every variable appearing in the query. When an expression Q𝑄Qitalic_Q is well-typed, the schema 𝗌𝖼𝗁(Q)𝗌𝖼𝗁𝑄\mathsf{sch}(Q)sansserif_sch ( italic_Q ) of this expression associates a type to each variable.

The answer of a GPC query Q(x¯)𝑄¯𝑥Q(\bar{x})italic_Q ( over¯ start_ARG italic_x end_ARG ) on a property graph G𝐺Gitalic_G, denoted QGx¯\llbracket Q\rrbracket_{G}^{\bar{x}}⟦ italic_Q ⟧ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUPERSCRIPT is a set of assignments. An assignment binds the variables x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG, present in the query, to values. Values to be associated to variables are dependent upon the deduced type of the variable for that query. Hence, for each type τ𝜏\tauitalic_τ, there is a set of values 𝒱τsubscript𝒱𝜏\mathcal{V}_{\tau}caligraphic_V start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT. Values may be references to elements in the graph, e.g., for 𝖭𝗈𝖽𝖾𝖭𝗈𝖽𝖾\mathsf{Node}sansserif_Node and 𝖤𝖽𝗀𝖾𝖤𝖽𝗀𝖾\mathsf{Edge}sansserif_Edge types.

All answers to queries we need to define our property graph transformations will have assignments of the variables among the types 𝖭𝗈𝖽𝖾𝖭𝗈𝖽𝖾\mathsf{Node}sansserif_Node and 𝖤𝖽𝗀𝖾𝖤𝖽𝗀𝖾\mathsf{Edge}sansserif_Edge. In our framework, we will use GPC queries extended with the capability to use conditioning on top of joins. This is not part of the specification in (10.1145/3584372.3588662), but this is planned to be in GQL (francis_researchers_2023).

In the following, we introduce GPC by means of examples. The reader can refer to (10.1145/3584372.3588662) for more details on GPC, and to (francis_researchers_2023) for insight on how it will actually be used in GQL.

In Example 1.1, the user can retrieve from the property graph in Figure 1 (1iii) “all people who live in the same city as a person named $namecurrency-dollar𝑛𝑎𝑚𝑒\$name$ italic_n italic_a italic_m italic_e” using the following GPC query:

(:𝖯𝖾𝗋𝗌𝗈𝗇)name=$name:𝖧𝖺𝗌𝖠𝖽𝖽𝗋𝖾𝗌𝗌  (:𝖢𝗂𝗍𝗒):𝖧𝖺𝗌𝖠𝖽𝖽𝗋𝖾𝗌𝗌  (y:𝖯𝖾𝗋𝗌𝗈𝗇)\underset{\langle name=\$name\rangle}{\left(\>:\mathsf{Person}\right)}\>% \underset{}{\leavevmode\hbox to57.55pt{\vbox to19.55pt{\pgfpicture% \makeatletter\hbox{\hskip 28.77318pt\lower-8.31221pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-25.04018pt}{3.04645pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\mathit{}% \,:\,\mathsf{HasAddress}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-28.57318pt}{0.0pt}\pgfsys@lineto{28.11319pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{28.% 11319pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\underset{}{\left(\>:\mathsf{City}\right)}% \>\underset{}{\leavevmode\hbox to57.55pt{\vbox to19.55pt{\pgfpicture% \makeatletter\hbox{\hskip 28.77318pt\lower-8.31221pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-25.04018pt}{3.04645pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\mathit{}% \,:\,\mathsf{HasAddress}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-28.11319pt}{0.0pt}\pgfsys@lineto{28.57318pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-1.0}{0.0}{0.0}{-1.0}{-% 28.11319pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\underset{}{\left(y:\mathsf{Person}\right)}start_UNDERACCENT ⟨ italic_n italic_a italic_m italic_e = $ italic_n italic_a italic_m italic_e ⟩ end_UNDERACCENT start_ARG ( : sansserif_Person ) end_ARG start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_HasAddress end_ARG start_UNDERACCENT end_UNDERACCENT start_ARG ( : sansserif_City ) end_ARG start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_HasAddress end_ARG start_UNDERACCENT end_UNDERACCENT start_ARG ( italic_y : sansserif_Person ) end_ARG

This is an example of a path pattern, which is essentially a regular path query (10.1145/2463664.2465216) augmented with conditioning: the filter name=$namedelimited-⟨⟩𝑛𝑎𝑚𝑒currency-dollar𝑛𝑎𝑚𝑒\langle name=\$name\rangle⟨ italic_n italic_a italic_m italic_e = $ italic_n italic_a italic_m italic_e ⟩ checks that the value of the property name𝑛𝑎𝑚𝑒nameitalic_n italic_a italic_m italic_e is indeed the one sought. Given a property graph, this pattern returns the nodes that can be matched to y𝑦yitalic_y.

One can also use graph patterns in GPC (also called patterns or queries in this paper), which are conjunctions of path patterns. For example, the following query retrieves pairs of people living in the same city, such that one person knows, possibly indirectly, the other one:

(x:𝖯𝖾𝗋𝗌𝗈𝗇):𝖧𝖺𝗌𝖠𝖽𝖽𝗋𝖾𝗌𝗌  (:𝖢𝗂𝗍𝗒):𝖧𝖺𝗌𝖠𝖽𝖽𝗋𝖾𝗌𝗌  (y:𝖯𝖾𝗋𝗌𝗈𝗇),(x):𝖪𝗇𝗈𝗐𝗌  1..(y).\begin{split}&\underset{}{\left(x:\mathsf{Person}\right)}\>\underset{}{% \leavevmode\hbox to57.55pt{\vbox to19.55pt{\pgfpicture\makeatletter\hbox{% \hskip 28.77318pt\lower-8.31221pt\hbox to0.0pt{\pgfsys@beginscope% \pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-25.04018pt}{3.04645pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\mathit{}% \,:\,\mathsf{HasAddress}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-28.57318pt}{0.0pt}\pgfsys@lineto{28.11319pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{28.% 11319pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\underset{}{\left(\>:\mathsf{City}\right)}% \>\underset{}{\leavevmode\hbox to57.55pt{\vbox to19.55pt{\pgfpicture% \makeatletter\hbox{\hskip 28.77318pt\lower-8.31221pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-25.04018pt}{3.04645pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\mathit{}% \,:\,\mathsf{HasAddress}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-28.11319pt}{0.0pt}\pgfsys@lineto{28.57318pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-1.0}{0.0}{0.0}{-1.0}{-% 28.11319pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\underset{}{\left(y:\mathsf{Person}\right)% },\\ &\underset{}{\left(x\right)}\>\underset{}{\leavevmode\hbox to42.09pt{\vbox to% 19.51pt{\pgfpicture\makeatletter\hbox{\hskip 21.044pt\lower-8.31221pt\hbox to0% .0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill% {0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }% \nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{% {\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-17.311pt}{3.08533pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\mathit{}% \,:\,\mathsf{Knows}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-20.84401pt}{0.0pt}\pgfsys@lineto{20.38402pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{20.% 38402pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}^{1..\infty}\>\underset{}{\left(y\right)}\,.% \end{split}start_ROW start_CELL end_CELL start_CELL start_UNDERACCENT end_UNDERACCENT start_ARG ( italic_x : sansserif_Person ) end_ARG start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_HasAddress end_ARG start_UNDERACCENT end_UNDERACCENT start_ARG ( : sansserif_City ) end_ARG start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_HasAddress end_ARG start_UNDERACCENT end_UNDERACCENT start_ARG ( italic_y : sansserif_Person ) end_ARG , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL start_UNDERACCENT end_UNDERACCENT start_ARG ( italic_x ) end_ARG start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_Knows end_ARG start_POSTSUPERSCRIPT 1 . . ∞ end_POSTSUPERSCRIPT start_UNDERACCENT end_UNDERACCENT start_ARG ( italic_y ) end_ARG . end_CELL end_ROW

Such graph patterns generalize conjunctive two-way regular path queries (10.1145/2463664.2465216) to property graphs.

In GPC, each path pattern occurring in a graph pattern must be qualified with a restrictor among the set of 𝗌𝗂𝗆𝗉𝗅𝖾𝗌𝗂𝗆𝗉𝗅𝖾\mathsf{simple}sansserif_simple, 𝗍𝗋𝖺𝗂𝗅𝗍𝗋𝖺𝗂𝗅\mathsf{trail}sansserif_trail (used by default if none is given) and 𝗌𝗁𝗈𝗋𝗍𝖾𝗌𝗍𝗌𝗁𝗈𝗋𝗍𝖾𝗌𝗍\mathsf{shortest}sansserif_shortest. The restrictor’s purpose is to ensure a finite result set: 𝗌𝗂𝗆𝗉𝗅𝖾𝗌𝗂𝗆𝗉𝗅𝖾\mathsf{simple}sansserif_simple prevents repetition of nodes along a path; 𝗍𝗋𝖺𝗂𝗅𝗍𝗋𝖺𝗂𝗅\mathsf{trail}sansserif_trail prevents repetition of edges; and 𝗌𝗁𝗈𝗋𝗍𝖾𝗌𝗍𝗌𝗁𝗈𝗋𝗍𝖾𝗌𝗍\mathsf{shortest}sansserif_shortest selects only the paths of minimal length among all the paths between two nodes.

For the ease of exposition, we simplify the semantics of GPC. We assume that a pattern only returns a set of bindings (in (10.1145/3584372.3588662), a tuple of witnessing paths is also returned with each binding). In GPC, variables used in the scope of a repetition operator, such as 1..{1..\infty}1 . . ∞, are called group variables and are bound to lists of nodes or edges. The remaining variables are called singleton variables and are bound to single nodes or edges. For the purpose of our transformation formalism we restrict the output of queries to singleton variables.

For a GPC pattern P𝑃Pitalic_P, a tuple x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG of singleton variables in P𝑃Pitalic_P, and a property graph G𝐺Gitalic_G, we write PGx¯\llbracket P\rrbracket^{\bar{x}}_{G}⟦ italic_P ⟧ start_POSTSUPERSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT for the set of bindings of x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG returned by P𝑃Pitalic_P on G𝐺Gitalic_G. For instance, if P𝑃Pitalic_P is the first query above and G𝐺Gitalic_G is the property graph depicted in Figure 1 (1iii), we have PGy={(y𝗉𝟤)}\llbracket P\rrbracket^{y}_{G}=\{(y\mapsto\mathsf{p2})\}⟦ italic_P ⟧ start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT = { ( italic_y ↦ sansserif_p2 ) } when $namecurrency-dollar𝑛𝑎𝑚𝑒\$name$ italic_n italic_a italic_m italic_e is “Jean”, PGy={(y𝗉𝟣)}\llbracket P\rrbracket^{y}_{G}=\{(y\mapsto\mathsf{p1})\}⟦ italic_P ⟧ start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT = { ( italic_y ↦ sansserif_p1 ) } when $namecurrency-dollar𝑛𝑎𝑚𝑒\$name$ italic_n italic_a italic_m italic_e is “Robert”, and PGy=\llbracket P\rrbracket^{y}_{G}=\emptyset⟦ italic_P ⟧ start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT = ∅ for any other name. (Note that the 𝗍𝗋𝖺𝗂𝗅𝗍𝗋𝖺𝗂𝗅\mathsf{trail}sansserif_trail restrictor has been used by default.)

4. Property graph transformations

In this section, we present our declarative formalism for specifying property graph transformations. An example is given in Figure 2. The specification consists of two rules. Each rule collects data from the input graph with a GPC pattern on the left of \Longrightarrow, and specifies elements of the input graph using the expression on the right. This expression resembles a GPC pattern, but it has specifications of the element’s property values instead of filters and specifications of element identifiers instead of variables to be matched (new variables will reappear on the right-hand side, in a slightly different role). In what follows we discuss how new identifiers are generated using Skolem functions (Section 5.1) and how identifiers, labels, and properties of output elements are specified using content constructors (Section 5.3). Then, we describe the general form of rules (Section 5.4) and explain their semantics in terms of a procedure that generates an output property graph given an input property graph (Section 5.5). We shall also see if the transformation in Figure 2 fixes the issues discussed in Example 1.1.

{toappendix}

5. Property graph transformations

We provide additional notation and definitions that are used in the main proofs of this paper.

5.1. Generating output identifiers

Throughout the paper we assume that all identifiers in input property graphs come from a countable set 𝒮𝒪𝒮𝒪\mathcal{S}\subset\mathcal{O}caligraphic_S ⊂ caligraphic_O of input identifiers, and ensure that all identifiers in output property graphs come from a countable set 𝒯𝒪𝒮𝒯𝒪𝒮\mathcal{T}\subset\mathcal{O}\setminus\mathcal{S}caligraphic_T ⊂ caligraphic_O ∖ caligraphic_S of output identifiers. Following (10.1145/3584372.3588654), to generate identifiers in the output graph, we use Skolem functions. Specifically, we use a fixed injective Skolem function

f:k(𝒪𝖢𝗈𝗇𝗌𝗍)k𝒯.:𝑓subscript𝑘superscript𝒪𝖢𝗈𝗇𝗌𝗍𝑘𝒯f:\bigcup_{k\in\mathbb{N}}\left(\mathcal{O}\cup\mathsf{Const}\cup\mathcal{L}% \right)^{k}\rightarrow\mathcal{T}.italic_f : ⋃ start_POSTSUBSCRIPT italic_k ∈ blackboard_N end_POSTSUBSCRIPT ( caligraphic_O ∪ sansserif_Const ∪ caligraphic_L ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → caligraphic_T .

In the context of relational schema mappings and data exchange, Skolem functions are used for value invention (10.1145/2463676.2465311), e.g., to generate artificial primary keys of new tuples in a way that makes is possible to refer to them in foreign keys. The way we use Skolem functions is similar, but not the same, because element identifiers are not data values. Rather, they are the property-graph analogue of object identifiers from the object-oriented data model (10.1145/290179.290182; 10.5555/645916.671975). Most of the time they are invisible to the user, and are not expected to carry any information beyond the identity of the element. Thus, the specific choice of function f𝑓fitalic_f is truly irrelevant, as long as f𝑓fitalic_f is injective.

Example 5.1.

In the rules in Figure 2 the Skolem function is kept implicit, but its arguments are explicitly listed. For example, in the subexpression ((u):𝖯𝖾𝗋𝗌𝗈𝗇):𝑢𝖯𝖾𝗋𝗌𝗈𝗇((u):\mathsf{Person})( ( italic_u ) : sansserif_Person ), on the right-hand side of both rules, (u)𝑢(u)( italic_u ) indicates that the identifier of the ouptut node is f(u)𝑓𝑢f(u)italic_f ( italic_u ) where u𝑢uitalic_u is (the identifier of) a node selected from the input property graph by the left-hand side GPC pattern, such as 𝗎𝟣𝗎𝟣\mathsf{u1}sansserif_u1. Because the same nodes u𝑢uitalic_u are selected in both rules, the subexpressions ((u):𝖯𝖾𝗋𝗌𝗈𝗇):𝑢𝖯𝖾𝗋𝗌𝗈𝗇((u):\mathsf{Person})( ( italic_u ) : sansserif_Person ) in both rules will be referring to the same output nodes. Further, (.countryName)formulae-sequence𝑐𝑜𝑢𝑛𝑡𝑟𝑦𝑁𝑎𝑚𝑒(\ell.countryName)( roman_ℓ . italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_N italic_a italic_m italic_e ) specifies the node identifier as f(.countryName)f(\ell.countryName)italic_f ( roman_ℓ . italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_N italic_a italic_m italic_e ), where .countryNameformulae-sequence𝑐𝑜𝑢𝑛𝑡𝑟𝑦𝑁𝑎𝑚𝑒\ell.countryNameroman_ℓ . italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_N italic_a italic_m italic_e name refers to the value of the property countryName𝑐𝑜𝑢𝑛𝑡𝑟𝑦𝑁𝑎𝑚𝑒countryNameitalic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_N italic_a italic_m italic_e in a node \ellroman_ℓ selected from the input graph, such as “United States”, and similarly for (a.cityName)formulae-sequence𝑎𝑐𝑖𝑡𝑦𝑁𝑎𝑚𝑒(a.cityName)( italic_a . italic_c italic_i italic_t italic_y italic_N italic_a italic_m italic_e ). If .countryName=a.cityNameformulae-sequence𝑐𝑜𝑢𝑛𝑡𝑟𝑦𝑁𝑎𝑚𝑒𝑎𝑐𝑖𝑡𝑦𝑁𝑎𝑚𝑒\ell.countryName=a.cityNameroman_ℓ . italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_N italic_a italic_m italic_e = italic_a . italic_c italic_i italic_t italic_y italic_N italic_a italic_m italic_e for some \ellroman_ℓ and a𝑎aitalic_a, which can happen in our example, (.countryName)formulae-sequence𝑐𝑜𝑢𝑛𝑡𝑟𝑦𝑁𝑎𝑚𝑒(\ell.countryName)( roman_ℓ . italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_N italic_a italic_m italic_e ) and (a.cityName)formulae-sequence𝑎𝑐𝑖𝑡𝑦𝑁𝑎𝑚𝑒(a.cityName)( italic_a . italic_c italic_i italic_t italic_y italic_N italic_a italic_m italic_e ) will indicate the same output node. \blacktriangleleft

{toappendix}

5.2. Generating output identifiers

Given a tuple of m𝑚mitalic_m variables x¯=(x1,,xm)¯𝑥subscript𝑥1subscript𝑥𝑚\bar{x}=(x_{1},\dots,x_{m})over¯ start_ARG italic_x end_ARG = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ), we define the sets of value arguments and arguments for x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG as

𝒱x¯cxi.aformulae-sequencesubscript𝒱¯𝑥conditional𝑐subscript𝑥𝑖𝑎\mathcal{V}_{\bar{x}}\Coloneqq c\mid x_{i}.acaligraphic_V start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUBSCRIPT ⩴ italic_c ∣ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . italic_a
𝒜x¯xicxi.aformulae-sequencesubscript𝒜¯𝑥conditionalsubscript𝑥𝑖delimited-∣∣𝑐subscript𝑥𝑖𝑎\mathcal{A}_{\bar{x}}\Coloneqq x_{i}\mid c\mid\ell\mid x_{i}.acaligraphic_A start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUBSCRIPT ⩴ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_c ∣ roman_ℓ ∣ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . italic_a

where xix¯subscript𝑥𝑖¯𝑥x_{i}\in\bar{x}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ over¯ start_ARG italic_x end_ARG, c𝖢𝗈𝗇𝗌𝗍𝑐𝖢𝗈𝗇𝗌𝗍c\in\mathsf{Const}italic_c ∈ sansserif_Const, \ell\in\mathcal{L}roman_ℓ ∈ caligraphic_L and a𝒦𝑎𝒦a\in\mathcal{K}italic_a ∈ caligraphic_K. We denote by 𝒜x¯ksubscriptsuperscript𝒜𝑘¯𝑥\mathcal{A}^{k}_{\bar{x}}caligraphic_A start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUBSCRIPT the set of all tuples of arguments for x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG of length k𝑘kitalic_k.

For a given property graph G𝐺Gitalic_G, A=(a1,,ak)𝐴subscript𝑎1subscript𝑎𝑘A=(a_{1},\dots,a_{k})italic_A = ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) a tuple of arguments for x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG defines a function 𝒪m(𝒪𝖢𝗈𝗇𝗌𝗍)ksuperscript𝒪𝑚superscript𝒪𝖢𝗈𝗇𝗌𝗍𝑘\mathcal{O}^{m}\to\left(\mathcal{O}\cup\mathsf{Const\cup\mathcal{L}}\right)^{k}caligraphic_O start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT → ( caligraphic_O ∪ sansserif_Const ∪ caligraphic_L ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT defined as (o1,,om)(v1,vk)maps-tosubscript𝑜1subscript𝑜𝑚subscript𝑣1subscript𝑣𝑘(o_{1},\dots,o_{m})\mapsto(v_{1},\dots v_{k})( italic_o start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_o start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ↦ ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) where, for all 1ik1𝑖𝑘1\leq i\leq k1 ≤ italic_i ≤ italic_k:

  • viojsubscript𝑣𝑖subscript𝑜𝑗v_{i}\coloneqq o_{j}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≔ italic_o start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT if ai=xjsubscript𝑎𝑖subscript𝑥𝑗a_{i}=x_{j}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT;

  • vicsubscript𝑣𝑖𝑐v_{i}\coloneqq citalic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≔ italic_c if ai=csubscript𝑎𝑖𝑐a_{i}=citalic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_c;

  • visubscript𝑣𝑖v_{i}\coloneqq\ellitalic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≔ roman_ℓ if ai=subscript𝑎𝑖a_{i}=\ellitalic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_ℓ;

  • vicsubscript𝑣𝑖𝑐v_{i}\coloneqq citalic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≔ italic_c if ai=xj.aformulae-sequencesubscript𝑎𝑖subscript𝑥𝑗𝑎a_{i}=x_{j}.aitalic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT . italic_a and δG(oj,a)=csubscript𝛿𝐺subscript𝑜𝑗𝑎𝑐\delta_{G}(o_{j},a)=citalic_δ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_o start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_a ) = italic_c.

5.3. Content constructors

A property graph transformation must be able to specify not only the identifiers of output elements, but also their labels and properties. For this purpose, we use content constructors. A content constructor is an expression of the form:

C(x¯)𝐶¯𝑥\displaystyle C(\bar{x})italic_C ( over¯ start_ARG italic_x end_ARG ) {\displaystyle\coloneqq\{≔ {
Id: (a1,,ak)subscript𝑎1subscript𝑎𝑘\displaystyle\>(a_{1},\dots,a_{k})( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
Labels: L𝐿\displaystyle\>Litalic_L
Properties: k1=v1,,kn=vn}\displaystyle\>\langle k_{1}=v_{1},\dots,k_{n}=v_{n}\rangle\>\}⟨ italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_k start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ }

where x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG is a tuple of variables, L𝐿Litalic_L is a finite set of labels; each kisubscript𝑘𝑖k_{i}italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a property name from 𝒦𝒦\mathcal{K}caligraphic_K; each visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is either a data value c𝖢𝗈𝗇𝗌𝗍𝑐𝖢𝗈𝗇𝗌𝗍c\in\mathsf{Const}italic_c ∈ sansserif_Const, or an expression of the form x.aformulae-sequence𝑥𝑎x.aitalic_x . italic_a for xx¯𝑥¯𝑥x\in\bar{x}italic_x ∈ over¯ start_ARG italic_x end_ARG and a𝒦𝑎𝒦a\in\mathcal{K}italic_a ∈ caligraphic_K; and each aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is either a constant c𝖢𝗈𝗇𝗌𝗍𝑐𝖢𝗈𝗇𝗌𝗍c\in\mathsf{Const}italic_c ∈ sansserif_Const, or a label \ell\in\mathcal{L}roman_ℓ ∈ caligraphic_L, or an expression of the form x.aformulae-sequence𝑥𝑎x.aitalic_x . italic_a or x𝑥xitalic_x for xx¯𝑥¯𝑥x\in\bar{x}italic_x ∈ over¯ start_ARG italic_x end_ARG and a𝒦𝑎𝒦a\in\mathcal{K}italic_a ∈ caligraphic_K. The field Id specifies the identity of the node by listing the arguments to be fed to the Skolem function. The fields Labels and Properties specify labels and properties present in an element. Importantly, they do not forbid additional labels and properties, which will allow the user to split the description of an element across multiple rules, if the user so desires. We write C.Idformulae-sequence𝐶IdC.\mbox{Id}italic_C . Id for the content of the Id field of C𝐶Citalic_C, and similarly for other fields. When x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG is clear from the context, we simply write C𝐶Citalic_C instead of C(x¯)𝐶¯𝑥C(\bar{x})italic_C ( over¯ start_ARG italic_x end_ARG ).

Example 5.2.

In the first rule in Figure 2, new 𝖢𝗈𝗎𝗇𝗍𝗋𝗒𝖢𝗈𝗎𝗇𝗍𝗋𝗒\mathsf{Country}sansserif_Country nodes are described using the following content constructor:

Ct(a,u,)subscript𝐶t𝑎𝑢\displaystyle C_{\mathrm{t}}(a,u,\ell)italic_C start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT ( italic_a , italic_u , roman_ℓ ) {\displaystyle\coloneqq\{≔ {
Id: (.countryName)formulae-sequence𝑐𝑜𝑢𝑛𝑡𝑟𝑦𝑁𝑎𝑚𝑒\displaystyle\>(\ell.countryName)( roman_ℓ . italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_N italic_a italic_m italic_e )
Labels: {Country}Country\displaystyle\>\{\mathrm{Country}\}{ roman_Country }
Properties: name=.countryName,code=.countryCode}.\displaystyle\>\langle name=\ell.countryName,\>code=\ell.countryCode\rangle\>\}.⟨ italic_n italic_a italic_m italic_e = roman_ℓ . italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_N italic_a italic_m italic_e , italic_c italic_o italic_d italic_e = roman_ℓ . italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_C italic_o italic_d italic_e ⟩ } .

It specifies the identities and the values of properties name𝑛𝑎𝑚𝑒nameitalic_n italic_a italic_m italic_e and code𝑐𝑜𝑑𝑒codeitalic_c italic_o italic_d italic_e of new 𝖢𝗈𝗎𝗇𝗍𝗋𝗒𝖢𝗈𝗎𝗇𝗍𝗋𝗒\mathsf{Country}sansserif_Country nodes in terms of the values of properties countryName𝑐𝑜𝑢𝑛𝑡𝑟𝑦𝑁𝑎𝑚𝑒countryNameitalic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_N italic_a italic_m italic_e and countryCode𝑐𝑜𝑢𝑛𝑡𝑟𝑦𝐶𝑜𝑑𝑒countryCodeitalic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_C italic_o italic_d italic_e retrieved from elements to which variable \ellroman_ℓ is bound in the input graph. Rather then using the abstract syntax introduced above, the rule in Figure 2 presents Ctsubscript𝐶tC_{\mathrm{t}}italic_C start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT in GPC-like syntax (10.1145/3584372.3588662) as

((.countryName):𝖢𝗈𝗎𝗇𝗍𝗋𝗒)name=.countryName,code=.countryCode.\underset{\langle name=\ell.countryName,\>code=\ell.countryCode\rangle}{\left(% (\ell.countryName):\mathsf{Country}\right)}\,.start_UNDERACCENT ⟨ italic_n italic_a italic_m italic_e = roman_ℓ . italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_N italic_a italic_m italic_e , italic_c italic_o italic_d italic_e = roman_ℓ . italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_C italic_o italic_d italic_e ⟩ end_UNDERACCENT start_ARG ( ( roman_ℓ . italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_N italic_a italic_m italic_e ) : sansserif_Country ) end_ARG .

5.4. Transformations

We describe transformations in terms of property graph transformation rules. Each rule brings together the data retrieved from the input property graph by a GPC pattern and a description of output elements expressed with content constructors.

We recall that the semantics of GPC is defined such that a query returns tuples. Each tuple represents a binding of singleton variables in that query to elements of the property graph.

We have two kinds of property graph transformation rules: node rules and edge rules. A node rule is an expression of the form:

P(x¯)(C(x¯))𝑃¯𝑥𝐶¯𝑥P(\bar{x})\implies(C(\bar{x}))italic_P ( over¯ start_ARG italic_x end_ARG ) ⟹ ( italic_C ( over¯ start_ARG italic_x end_ARG ) )

where P(x¯)𝑃¯𝑥P(\bar{x})italic_P ( over¯ start_ARG italic_x end_ARG ) is a GPC query with singleton variables x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG and C(x¯)𝐶¯𝑥C(\bar{x})italic_C ( over¯ start_ARG italic_x end_ARG ) is a content constructor. An edge rule is an expression of the form:

P(x¯)(Cs(x¯))C(x¯)  (Ct(x¯))𝑃¯𝑥subscript𝐶s¯𝑥absent𝐶¯𝑥  subscript𝐶t¯𝑥P(\bar{x})\implies(C_{\mathrm{s}}(\bar{x}))\>\underset{}{\leavevmode\hbox to% 26.75pt{\vbox to20.84pt{\pgfpicture\makeatletter\hbox{\hskip 13.37376pt\lower-% 8.31221pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}% }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-9.64076pt}{3.5131pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle C(\bar{x}% )$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-13.17377pt}{0.0pt}\pgfsys@lineto{12.71378pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{12.% 71378pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>(C_{\mathrm{t}}(\bar{x}))italic_P ( over¯ start_ARG italic_x end_ARG ) ⟹ ( italic_C start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG ) ) start_UNDERACCENT end_UNDERACCENT start_ARG italic_C ( over¯ start_ARG italic_x end_ARG ) end_ARG ( italic_C start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG ) )

where P(x¯)𝑃¯𝑥P(\bar{x})italic_P ( over¯ start_ARG italic_x end_ARG ) is a GPC query with singleton variables x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG and Cs(x¯),C(x¯)subscript𝐶s¯𝑥𝐶¯𝑥C_{\mathrm{s}}(\bar{x}),C(\bar{x})italic_C start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG ) , italic_C ( over¯ start_ARG italic_x end_ARG ) and Ct(x¯)subscript𝐶t¯𝑥C_{\mathrm{t}}(\bar{x})italic_C start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG ) are content constructors. Finally, a property graph transformation is a finite set of property graph transformation rules.

Example 5.3.

The first edge rule in Figure 2 is built from the content constructor Ctsubscript𝐶tC_{\mathrm{t}}italic_C start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT as defined in Example 5.2, and of the following two content constructors Cssubscript𝐶sC_{\mathrm{s}}italic_C start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT and C𝐶Citalic_C:

Cs(a,u,)subscript𝐶s𝑎𝑢\displaystyle C_{\mathrm{s}}(a,u,\ell)italic_C start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_a , italic_u , roman_ℓ ) {\displaystyle\coloneqq\{≔ {
Id: (u)𝑢\displaystyle\>(u)( italic_u )
Labels: {Person}Person\displaystyle\>\{\mathrm{Person}\}{ roman_Person }
Properties: name=u.name},\displaystyle\>\langle name=u.name\rangle\>\},⟨ italic_n italic_a italic_m italic_e = italic_u . italic_n italic_a italic_m italic_e ⟩ } ,
C(a,u,)𝐶𝑎𝑢\displaystyle C(a,u,\ell)italic_C ( italic_a , italic_u , roman_ℓ ) {\displaystyle\coloneqq\{≔ {
Id: ()\displaystyle\>()( )
Labels: {HasLocation}HasLocation\displaystyle\>\{\mathrm{HasLocation}\}{ roman_HasLocation }
Properties: }.\displaystyle\>\langle\rangle\>\}.\qquad\qquad\blacktriangleleft⟨ ⟩ } . ◀

The above definition allows specifying multiple labels with a single constructor as well as specifying the labels of a single element using multiple rules. This feature, illustrated in the following example, is crucial. Without it, in the presence of type hierarchies, one would need negation in the query language to avoid duplicating output elements. In our setting, GPC does not permit negating patterns and it is unlikely for the complexity upper bounds in Section 6.1 to hold when this form of negation is added.

Example 5.4.

As discussed in Example 5.1, if for some nodes \ellroman_ℓ and a𝑎aitalic_a selected by the GPC patterns in the rules of Figure 2, .countryNameformulae-sequencecountryName\ell.\textit{countryName}roman_ℓ . countryName and a.cityNameformulae-sequence𝑎cityNamea.\textit{cityName}italic_a . cityName are equal, then the Ctsubscript𝐶tC_{\mathrm{t}}italic_C start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT constructors in both rules refer to the same output node. For instance for, =𝗅𝟣𝗅𝟣\ell=\mathsf{l1}roman_ℓ = sansserif_l1 and a=𝖺𝟣𝑎𝖺𝟣a=\mathsf{a1}italic_a = sansserif_a1 in the input graph in Figure 1 (1i), both rules refer to the node f(``𝐿𝑢𝑥𝑒𝑚𝑏𝑢𝑟𝑔")𝑓``𝐿𝑢𝑥𝑒𝑚𝑏𝑢𝑟𝑔"f(``\mathit{Luxemburg}")italic_f ( ` ` italic_Luxemburg " ) in the output graph in Figure 3. In consequence, this node has two labels, 𝖢𝗂𝗍𝗒𝖢𝗂𝗍𝗒\mathsf{City}sansserif_City and 𝖢𝗈𝗎𝗇𝗍𝗋𝗒𝖢𝗈𝗎𝗇𝗍𝗋𝗒\mathsf{Country}sansserif_Country. This, quite likely, is not what the user actually wants. We will later see how to fix it by adjusting the rules. \blacktriangleleft

Property graphs are multigraphs and our rules allow specifying multiple edges with the same endpoints by using different arguments for the Skolem function. We will see an example in Section 8.

We refer to the right-hand side expressions in node (resp. edge) rules as node (resp. edge) constructors. We also allow rules of a more general form, illustrated in Figure 4, where a comma-separated list of node and edge constructors can be used on the right-hand side. We also support aliasing, with scope limited to a single rule. For instance, in the rule in Figure 4, we introduce alias x=(u)𝑥𝑢x=(u)italic_x = ( italic_u ) in the first edge constructor, and use it in the second edge contructor. Both these extensions are syntactic sugar. To eliminate aliases, we simply substitute them with their definitions: in the example, we replace x𝑥xitalic_x in the second edge constructor with (u)𝑢(u)( italic_u ). Then, we split the rules: for each node or edge constructor on the right-hand side, we create a separate rule with the same GPC pattern on the left.

5.5. Semantics

{toappendix}

5.6. Semantics

In this section, we describe operationally in Algorithm 1 how a transformation given as a set of node and edge rules turns an input property graph into an output property graph. In Section 8, we will see how to implement this efficiently in an existing graph database.

Given a GPC query P(x¯)𝑃¯𝑥P(\bar{x})italic_P ( over¯ start_ARG italic_x end_ARG ), a content constructor C(x¯)𝐶¯𝑥C(\bar{x})italic_C ( over¯ start_ARG italic_x end_ARG ) and a binding o¯¯𝑜\bar{o}over¯ start_ARG italic_o end_ARG for P(x¯)𝑃¯𝑥P(\bar{x})italic_P ( over¯ start_ARG italic_x end_ARG ) over an input property graph G𝐺Gitalic_G, we define C.Id(o¯)formulae-sequence𝐶Id¯𝑜C.\mbox{Id}(\bar{o})italic_C . Id ( over¯ start_ARG italic_o end_ARG ) by replacing in C.Idformulae-sequence𝐶IdC.\mbox{Id}italic_C . Id each xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT with ojsubscript𝑜𝑗o_{j}italic_o start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and each xj.aformulae-sequencesubscript𝑥𝑗𝑎x_{j}.aitalic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT . italic_a with δG(oj,a)subscript𝛿𝐺subscript𝑜𝑗𝑎\delta_{G}(o_{j},a)italic_δ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_o start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_a ). Similarly, we define C.Properties(o¯)formulae-sequence𝐶Properties¯𝑜C.\mbox{Properties}(\bar{o})italic_C . Properties ( over¯ start_ARG italic_o end_ARG ) by replacing in C.Propertiesformulae-sequence𝐶PropertiesC.\mbox{Properties}italic_C . Properties each xj.aformulae-sequencesubscript𝑥𝑗𝑎x_{j}.aitalic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT . italic_a with δG(oj,a)subscript𝛿𝐺subscript𝑜𝑗𝑎\delta_{G}(o_{j},a)italic_δ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_o start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_a ).

Algorithm 1 Semantics of a set of transformation rules.
1:A property graph G𝐺Gitalic_G and a set of transformation rules T𝑇Titalic_T.
2:An output of the transformation T𝑇Titalic_T over G𝐺Gitalic_G, a property graph T(G)=N,E,λ,𝗌𝗋𝖼,𝗍𝗀𝗍,δ𝑇𝐺𝑁𝐸𝜆𝗌𝗋𝖼𝗍𝗀𝗍𝛿T(G)=\langle N,E,\lambda,\mathsf{src},\mathsf{tgt},\delta\rangleitalic_T ( italic_G ) = ⟨ italic_N , italic_E , italic_λ , sansserif_src , sansserif_tgt , italic_δ ⟩.
3:initialize T(G)𝑇𝐺T(G)italic_T ( italic_G ) to the empty property graph
4:for each edge rule P(x¯)(Cs(x¯))C(x¯)  (Ct(x¯))T𝑃¯𝑥subscript𝐶s¯𝑥absent𝐶¯𝑥  subscript𝐶t¯𝑥𝑇P(\bar{x})\implies(C_{\mathrm{s}}(\bar{x}))\>\underset{}{\leavevmode\hbox to% 26.75pt{\vbox to20.84pt{\pgfpicture\makeatletter\hbox{\hskip 13.37376pt\lower-% 8.31221pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}% }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-9.64076pt}{3.5131pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle C(\bar{x}% )$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-13.17377pt}{0.0pt}\pgfsys@lineto{12.71378pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{12.% 71378pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>(C_{\mathrm{t}}(\bar{x}))\in Titalic_P ( over¯ start_ARG italic_x end_ARG ) ⟹ ( italic_C start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG ) ) start_UNDERACCENT end_UNDERACCENT start_ARG italic_C ( over¯ start_ARG italic_x end_ARG ) end_ARG ( italic_C start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG ) ) ∈ italic_T do
5:     add rules P(x¯)(Cs(x¯))𝑃¯𝑥subscript𝐶s¯𝑥P(\bar{x})\implies(C_{\mathrm{s}}(\bar{x}))italic_P ( over¯ start_ARG italic_x end_ARG ) ⟹ ( italic_C start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG ) ) and P(x¯)(Ct(x¯))𝑃¯𝑥subscript𝐶t¯𝑥P(\bar{x})\implies(C_{\mathrm{t}}(\bar{x}))italic_P ( over¯ start_ARG italic_x end_ARG ) ⟹ ( italic_C start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG ) ) to T𝑇Titalic_T
6:for each node rule P(x¯)(C(x¯))T𝑃¯𝑥𝐶¯𝑥𝑇P(\bar{x})\implies(C(\bar{x}))\in Titalic_P ( over¯ start_ARG italic_x end_ARG ) ⟹ ( italic_C ( over¯ start_ARG italic_x end_ARG ) ) ∈ italic_T do
7:     for each binding o¯PGx¯\bar{o}\in\llbracket P\rrbracket^{\bar{x}}_{G}over¯ start_ARG italic_o end_ARG ∈ ⟦ italic_P ⟧ start_POSTSUPERSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT do
8:         NN{of(C.Id(o¯))}N\leftarrow N\cup\{o\coloneqq f(C.\mbox{Id}(\bar{o}))\}italic_N ← italic_N ∪ { italic_o ≔ italic_f ( italic_C . Id ( over¯ start_ARG italic_o end_ARG ) ) }
9:         λ(o)λ(o)C.Labelsformulae-sequence𝜆𝑜𝜆𝑜𝐶Labels\lambda(o)\leftarrow\lambda(o)\cup C.\mathrm{Labels}italic_λ ( italic_o ) ← italic_λ ( italic_o ) ∪ italic_C . roman_Labels
10:         set δ(o,k)𝛿𝑜𝑘\delta(o,k)italic_δ ( italic_o , italic_k ) to c𝑐citalic_c if C.Properties(o¯)formulae-sequence𝐶Properties¯𝑜C.\mathrm{Properties}(\bar{o})italic_C . roman_Properties ( over¯ start_ARG italic_o end_ARG ) sets property k𝑘kitalic_k to c𝑐citalic_c      
11:for each edge rule P(x¯)(Cs(x¯))C(x¯)  (Ct(x¯))T𝑃¯𝑥subscript𝐶s¯𝑥absent𝐶¯𝑥  subscript𝐶t¯𝑥𝑇P(\bar{x})\implies(C_{\mathrm{s}}(\bar{x}))\>\underset{}{\leavevmode\hbox to% 26.75pt{\vbox to20.84pt{\pgfpicture\makeatletter\hbox{\hskip 13.37376pt\lower-% 8.31221pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}% }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-9.64076pt}{3.5131pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle C(\bar{x}% )$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-13.17377pt}{0.0pt}\pgfsys@lineto{12.71378pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{12.% 71378pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>(C_{\mathrm{t}}(\bar{x}))\in Titalic_P ( over¯ start_ARG italic_x end_ARG ) ⟹ ( italic_C start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG ) ) start_UNDERACCENT end_UNDERACCENT start_ARG italic_C ( over¯ start_ARG italic_x end_ARG ) end_ARG ( italic_C start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG ) ) ∈ italic_T do
12:     for each binding o¯PGx¯\bar{o}\in\llbracket P\rrbracket^{\bar{x}}_{G}over¯ start_ARG italic_o end_ARG ∈ ⟦ italic_P ⟧ start_POSTSUPERSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT do
13:         osf(Cs.Id(o¯));otf(Ct.Id(o¯))o_{\mathrm{s}}\leftarrow f(C_{\mathrm{s}}.\mbox{Id}(\bar{o}));\ o_{\mathrm{t}}% \leftarrow f(C_{\mathrm{t}}.\mbox{Id}(\bar{o}))italic_o start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ← italic_f ( italic_C start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT . Id ( over¯ start_ARG italic_o end_ARG ) ) ; italic_o start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT ← italic_f ( italic_C start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT . Id ( over¯ start_ARG italic_o end_ARG ) )
14:         EE{of(os,C.Id(o¯),ot)}E\leftarrow E\cup\{o\coloneqq f(o_{\mathrm{s}},C.\mbox{Id}(\bar{o}),o_{\mathrm% {t}})\}italic_E ← italic_E ∪ { italic_o ≔ italic_f ( italic_o start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT , italic_C . Id ( over¯ start_ARG italic_o end_ARG ) , italic_o start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT ) }
15:         𝗌𝗋𝖼(o)os;𝗍𝗀𝗍(o)otformulae-sequence𝗌𝗋𝖼𝑜subscript𝑜s𝗍𝗀𝗍𝑜subscript𝑜t\mathsf{src}(o)\leftarrow o_{\mathrm{s}};\ \mathsf{tgt}(o)\leftarrow o_{% \mathrm{t}}sansserif_src ( italic_o ) ← italic_o start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ; sansserif_tgt ( italic_o ) ← italic_o start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT
16:         λ(o)λ(o)C.Labelsformulae-sequence𝜆𝑜𝜆𝑜𝐶Labels\lambda(o)\leftarrow\lambda(o)\cup C.\mathrm{Labels}italic_λ ( italic_o ) ← italic_λ ( italic_o ) ∪ italic_C . roman_Labels
17:         set δ(o,k)𝛿𝑜𝑘\delta(o,k)italic_δ ( italic_o , italic_k ) to c𝑐citalic_c if C.Properties(o¯)formulae-sequence𝐶Properties¯oC.\mathrm{Properties(\bar{o})}italic_C . roman_Properties ( over¯ start_ARG roman_o end_ARG ) sets property k𝑘kitalic_k to c𝑐citalic_c      
(1) (u:𝖴𝗌𝖾𝗋),(a:𝖠𝖽𝖽𝗋𝖾𝗌𝗌),(:𝖫𝗈𝖼𝖺𝗍𝗂𝗈𝗇)u.address=a.aid,u.address=.aid((u):𝖯𝖾𝗋𝗌𝗈𝗇)name=u.name\displaystyle\underset{\langle u.address=a.aid,\>u.address=\ell.aid\rangle}{(u% :\mathsf{User}),(a:\mathsf{Address}),(\ell:\mathsf{Location})}\implies% \underset{\langle name=u.name\rangle}{\left((u):\mathsf{Person}\right)}start_UNDERACCENT ⟨ italic_u . italic_a italic_d italic_d italic_r italic_e italic_s italic_s = italic_a . italic_a italic_i italic_d , italic_u . italic_a italic_d italic_d italic_r italic_e italic_s italic_s = roman_ℓ . italic_a italic_i italic_d ⟩ end_UNDERACCENT start_ARG ( italic_u : sansserif_User ) , ( italic_a : sansserif_Address ) , ( roman_ℓ : sansserif_Location ) end_ARG ⟹ start_UNDERACCENT ⟨ italic_n italic_a italic_m italic_e = italic_u . italic_n italic_a italic_m italic_e ⟩ end_UNDERACCENT start_ARG ( ( italic_u ) : sansserif_Person ) end_ARG :𝖧𝖺𝗌𝖫𝗈𝖼𝖺𝗍𝗂𝗈𝗇  ((.countryName):𝖢𝗈𝗎𝗇𝗍𝗋𝗒)name=.countryName,code=.countryCode\displaystyle\>\underset{}{\leavevmode\hbox to53.97pt{\vbox to18.61pt{% \pgfpicture\makeatletter\hbox{\hskip 26.98598pt\lower-7.97891pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-23.58627pt}{3.3245pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\mathit{}% \,:\,\mathsf{HasLocation}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.5pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb% }{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-26.78598pt}{0.0pt}\pgfsys@lineto{26.32599pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{26.% 32599pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\underset{\langle name=\ell.countryName,\>% code=\ell.countryCode\rangle}{\left((\ell.countryName):\mathsf{Country}\right)}start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_HasLocation end_ARG start_UNDERACCENT ⟨ italic_n italic_a italic_m italic_e = roman_ℓ . italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_N italic_a italic_m italic_e , italic_c italic_o italic_d italic_e = roman_ℓ . italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_C italic_o italic_d italic_e ⟩ end_UNDERACCENT start_ARG ( ( roman_ℓ . italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_N italic_a italic_m italic_e ) : sansserif_Country ) end_ARG
(2) (u:𝖴𝗌𝖾𝗋),(a:𝖠𝖽𝖽𝗋𝖾𝗌𝗌),(:𝖫𝗈𝖼𝖺𝗍𝗂𝗈𝗇)u.address=a.aid,u.address=.aid((u):𝖯𝖾𝗋𝗌𝗈𝗇)name=u.name\displaystyle\underset{\langle u.address=a.aid,\>u.address=\ell.aid\rangle}{(u% :\mathsf{User}),(a:\mathsf{Address}),(\ell:\mathsf{Location})}\implies% \underset{\langle name=u.name\rangle}{\left((u):\mathsf{Person}\right)}start_UNDERACCENT ⟨ italic_u . italic_a italic_d italic_d italic_r italic_e italic_s italic_s = italic_a . italic_a italic_i italic_d , italic_u . italic_a italic_d italic_d italic_r italic_e italic_s italic_s = roman_ℓ . italic_a italic_i italic_d ⟩ end_UNDERACCENT start_ARG ( italic_u : sansserif_User ) , ( italic_a : sansserif_Address ) , ( roman_ℓ : sansserif_Location ) end_ARG ⟹ start_UNDERACCENT ⟨ italic_n italic_a italic_m italic_e = italic_u . italic_n italic_a italic_m italic_e ⟩ end_UNDERACCENT start_ARG ( ( italic_u ) : sansserif_Person ) end_ARG :𝖧𝖺𝗌𝖠𝖽𝖽𝗋𝖾𝗌𝗌  ((a.cityName):𝖢𝗂𝗍𝗒)name=a.cityName,code=a.cityCode\displaystyle\>\underset{}{\leavevmode\hbox to51.87pt{\vbox to18.64pt{% \pgfpicture\makeatletter\hbox{\hskip 25.93596pt\lower-7.97891pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-22.53625pt}{3.28949pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\mathit{}% \,:\,\mathsf{HasAddress}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.5pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb% }{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-25.73596pt}{0.0pt}\pgfsys@lineto{25.27597pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{25.% 27597pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\underset{\langle name=a.cityName,\>code=a% .cityCode\rangle}{\left((a.cityName):\mathsf{City}\right)}start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_HasAddress end_ARG start_UNDERACCENT ⟨ italic_n italic_a italic_m italic_e = italic_a . italic_c italic_i italic_t italic_y italic_N italic_a italic_m italic_e , italic_c italic_o italic_d italic_e = italic_a . italic_c italic_i italic_t italic_y italic_C italic_o italic_d italic_e ⟩ end_UNDERACCENT start_ARG ( ( italic_a . italic_c italic_i italic_t italic_y italic_N italic_a italic_m italic_e ) : sansserif_City ) end_ARG
Figure 2. Transformation T𝑇Titalic_T given as a set of rules.
Refer to caption
Figure 3. Output property graph T(G)𝑇𝐺T(G)italic_T ( italic_G ).
Example 5.5.

We describe, step by step, the operations carried out by Algorithm 1 on the input consisting of the property graph G𝐺Gitalic_G from Figure 1 (1i) and the transformation T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT which contains only the first of the two rules in Figure 2.

First, the GPC query

P(u,a,)(u:𝖴𝗌𝖾𝗋),(a:𝖠𝖽𝖽𝗋𝖾𝗌𝗌),(:𝖫𝗈𝖼𝖺𝗍𝗂𝗈𝗇)u.address=a.aid,u.address=.aidP(u,a,\ell)\coloneqq\underset{\langle u.address=a.aid,\>u.address=\ell.aid% \rangle}{(u:\mathsf{User}),(a:\mathsf{Address}),(\ell:\mathsf{Location})}italic_P ( italic_u , italic_a , roman_ℓ ) ≔ start_UNDERACCENT ⟨ italic_u . italic_a italic_d italic_d italic_r italic_e italic_s italic_s = italic_a . italic_a italic_i italic_d , italic_u . italic_a italic_d italic_d italic_r italic_e italic_s italic_s = roman_ℓ . italic_a italic_i italic_d ⟩ end_UNDERACCENT start_ARG ( italic_u : sansserif_User ) , ( italic_a : sansserif_Address ) , ( roman_ℓ : sansserif_Location ) end_ARG

is executed on G𝐺Gitalic_G (only once in the entire process) and outputs the set of bindings PGu,a,={(u𝗎𝟣,a𝖺𝟣,𝗅𝟣),(u𝗎𝟤,a𝖺𝟤,𝗅𝟤)}.\llbracket P\rrbracket^{u,a,\ell}_{G}=\{(u\mapsto\mathsf{u1},a\mapsto\mathsf{a% 1},\ell\mapsto\mathsf{l1}),\>(u\mapsto\mathsf{u2},a\mapsto\mathsf{a2},\ell% \mapsto\mathsf{l2})\}.⟦ italic_P ⟧ start_POSTSUPERSCRIPT italic_u , italic_a , roman_ℓ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT = { ( italic_u ↦ sansserif_u1 , italic_a ↦ sansserif_a1 , roman_ℓ ↦ sansserif_l1 ) , ( italic_u ↦ sansserif_u2 , italic_a ↦ sansserif_a2 , roman_ℓ ↦ sansserif_l2 ) } .

In Line 5, the single edge rule of T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is split into node rules P(x¯)(Cs(x¯))𝑃¯𝑥subscript𝐶s¯𝑥P(\bar{x})\implies(C_{\mathrm{s}}(\bar{x}))italic_P ( over¯ start_ARG italic_x end_ARG ) ⟹ ( italic_C start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG ) ) and P(x¯)(Ct(x¯))𝑃¯𝑥subscript𝐶t¯𝑥P(\bar{x})\implies(C_{\mathrm{t}}(\bar{x}))italic_P ( over¯ start_ARG italic_x end_ARG ) ⟹ ( italic_C start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG ) ), where Cssubscript𝐶sC_{\mathrm{s}}italic_C start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT and Ctsubscript𝐶tC_{\mathrm{t}}italic_C start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT have been defined in Example 5.3 and  5.2, respectively. These two node rules are added to T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, which initially contains no node rules.

Suppose that the node rule P(x¯)(Cs(x¯))𝑃¯𝑥subscript𝐶s¯𝑥P(\bar{x})\implies(C_{\mathrm{s}}(\bar{x}))italic_P ( over¯ start_ARG italic_x end_ARG ) ⟹ ( italic_C start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG ) ) is considered first in the loop in Line 6. Two output nodes are created with respective identifiers f(u1)𝑓𝑢1f(u1)italic_f ( italic_u 1 ) and f(u2)𝑓𝑢2f(u2)italic_f ( italic_u 2 ) (Line 8), one for each binding. Initially, they have no labels, λ(f(u1))=λ(f(u2))=𝜆𝑓𝑢1𝜆𝑓𝑢2\lambda(f(u1))=\lambda(f(u2))=\emptysetitalic_λ ( italic_f ( italic_u 1 ) ) = italic_λ ( italic_f ( italic_u 2 ) ) = ∅, and no properties. Then both get label 𝖯𝖾𝗋𝗌𝗈𝗇𝖯𝖾𝗋𝗌𝗈𝗇\mathsf{Person}sansserif_Person (Line 9) and their property name𝑛𝑎𝑚𝑒nameitalic_n italic_a italic_m italic_e is set to “Jean” and “Robert”, respectively (Line 10).

Next, the algorithm moves to the node rule P(x¯)(Ct(x¯))𝑃¯𝑥subscript𝐶t¯𝑥P(\bar{x})\implies(C_{\mathrm{t}}(\bar{x}))italic_P ( over¯ start_ARG italic_x end_ARG ) ⟹ ( italic_C start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG ) ). Two nodes are created in the output with respective identifiers f(``𝐿𝑢𝑥𝑒𝑚𝑏𝑢𝑟𝑔")𝑓``𝐿𝑢𝑥𝑒𝑚𝑏𝑢𝑟𝑔"f(``\mathit{Luxemburg}")italic_f ( ` ` italic_Luxemburg " ) and f(``𝑈𝑛𝑖𝑡𝑒𝑑𝑆𝑡𝑎𝑡𝑒𝑠")𝑓``𝑈𝑛𝑖𝑡𝑒𝑑𝑆𝑡𝑎𝑡𝑒𝑠"f(``\mathit{United\ States}")italic_f ( ` ` italic_United italic_States " ) (Line 8), one for each binding; they both get label 𝖢𝗈𝗎𝗇𝗍𝗋𝗒𝖢𝗈𝗎𝗇𝗍𝗋𝗒\mathsf{Country}sansserif_Country (Line 9); and their properties name𝑛𝑎𝑚𝑒nameitalic_n italic_a italic_m italic_e and code𝑐𝑜𝑑𝑒codeitalic_c italic_o italic_d italic_e are filled in (Line 10).

Finally, the algorithm steps through the only edge rule in T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. For the first binding, the nodes corresponding to the endpoints of the edge that has to be created, namely osf(u1)subscript𝑜s𝑓𝑢1o_{\mathrm{s}}\coloneqq f(u1)italic_o start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ≔ italic_f ( italic_u 1 ) and otsubscript𝑜tabsento_{\mathrm{t}}\coloneqqitalic_o start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT ≔ f(``𝐿𝑢𝑥𝑒𝑚𝑏𝑢𝑟𝑔")𝑓``𝐿𝑢𝑥𝑒𝑚𝑏𝑢𝑟𝑔"f(``\mathit{Luxemburg}")italic_f ( ` ` italic_Luxemburg " ) are retrieved (Line 13). They correspond to the nodes that were created, from this binding, by the node rules that were added to T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in Line 5. An edge with id f(f(u1),f(``𝐿𝑢𝑥𝑒𝑚𝑏𝑢𝑟𝑔"))𝑓𝑓𝑢1𝑓``𝐿𝑢𝑥𝑒𝑚𝑏𝑢𝑟𝑔"f(f(u1),f(``\mathit{Luxemburg}"))italic_f ( italic_f ( italic_u 1 ) , italic_f ( ` ` italic_Luxemburg " ) ) is created (Line 14); its source and target are set to f(u1)𝑓𝑢1f(u1)italic_f ( italic_u 1 ) and f(``𝐿𝑢𝑥𝑒𝑚𝑏𝑢𝑟𝑔")𝑓``𝐿𝑢𝑥𝑒𝑚𝑏𝑢𝑟𝑔"f(``\mathit{Luxemburg}")italic_f ( ` ` italic_Luxemburg " ), respectively (Line 15); it gets label 𝖧𝖺𝗌𝖫𝗈𝖼𝖺𝗍𝗂𝗈𝗇𝖧𝖺𝗌𝖫𝗈𝖼𝖺𝗍𝗂𝗈𝗇\mathsf{HasLocation}sansserif_HasLocation (Line 16); and no property is filled (Line 17). For the second binding, an edge is created by the same process between the nodes f(u2)𝑓𝑢2f(u2)italic_f ( italic_u 2 ) and f(``𝑈𝑛𝑖𝑡𝑒𝑑𝑆𝑡𝑎𝑡𝑒𝑠")𝑓``𝑈𝑛𝑖𝑡𝑒𝑑𝑆𝑡𝑎𝑡𝑒𝑠"f(``\mathit{United\ States}")italic_f ( ` ` italic_United italic_States " ). \blacktriangleleft

The role of Algorithm 1 is to give semantics to a set of transformation rules: it explains how the outputs of the multiple rules are consolidated into a single output property graph. The following result shows that our transformations are indeed graph-to-graph transformations, offering a way to meet the expected requirements of future versions of standard graph query languages (francis_researchers_2023).

{propositionrep}

Given an input property graph G𝐺Gitalic_G and a property graph transformation T𝑇Titalic_T, Algorithm 1 always returns a valid instance of the property graph data model.

Proof.

Given T𝑇Titalic_T and G𝐺Gitalic_G with identifiers from 𝒮𝒮\mathcal{S}caligraphic_S, let T(G)N,E,λ,𝗌𝗋𝖼,𝗍𝗀𝗍,δ𝑇𝐺𝑁𝐸𝜆𝗌𝗋𝖼𝗍𝗀𝗍𝛿T(G)\coloneqq\langle N,E,\lambda,\mathsf{src},\mathsf{tgt},\delta\rangleitalic_T ( italic_G ) ≔ ⟨ italic_N , italic_E , italic_λ , sansserif_src , sansserif_tgt , italic_δ ⟩ be a property graph returned by Algorithm 1. We have to check that (i) both N𝑁Nitalic_N and E𝐸Eitalic_E are finite and disjoint, (ii) all elements have a finite number of labels, and (iii) every edge has exactly one source and one target.

(i) The set of bindings resulting from querying a property graph G𝐺Gitalic_G with the query P(x¯)𝑃¯𝑥P(\bar{x})italic_P ( over¯ start_ARG italic_x end_ARG ), PGx¯\llbracket P\rrbracket^{\bar{x}}_{G}⟦ italic_P ⟧ start_POSTSUPERSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT, is assumed to be finite, moreover, we have a finite number of rules in T𝑇Titalic_T, hence the finiteness of NE𝑁𝐸N\cup Eitalic_N ∪ italic_E. A similar reasoning shows the finiteness of the label set for each element in T(G)𝑇𝐺T(G)italic_T ( italic_G ); this is because each rule can mention at most a finite number of labels.

(ii) We now show that NE=𝑁𝐸N\cap E=\emptysetitalic_N ∩ italic_E = ∅. Let us assume that o𝒯𝑜𝒯o\in\mathcal{T}italic_o ∈ caligraphic_T is both a node and an edge id in T(G)𝑇𝐺T(G)italic_T ( italic_G ) – respectively resulting from a node rule RP(x¯)(D)𝑅𝑃¯𝑥𝐷R\coloneqq P(\bar{x})\implies\left(D\right)italic_R ≔ italic_P ( over¯ start_ARG italic_x end_ARG ) ⟹ ( italic_D ) for o¯¯𝑜\bar{o}over¯ start_ARG italic_o end_ARG and an edge rule SQ(y¯)(Cs)C  (Ct)𝑆𝑄¯𝑦subscript𝐶sabsent𝐶  subscript𝐶tS\coloneqq Q(\bar{y})\implies(C_{\mathrm{s}})\>\underset{}{\leavevmode\hbox to% 16.3pt{\vbox to19.51pt{\pgfpicture\makeatletter\hbox{\hskip 8.15154pt\lower-8.% 31221pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}% }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-4.41853pt}{3.08533pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle C$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-7.95154pt}{0.0pt}\pgfsys@lineto{7.49155pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{7.49155pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>(C_{\mathrm{t}})italic_S ≔ italic_Q ( over¯ start_ARG italic_y end_ARG ) ⟹ ( italic_C start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ) start_UNDERACCENT end_UNDERACCENT start_ARG italic_C end_ARG ( italic_C start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT ) for p¯¯𝑝\bar{p}over¯ start_ARG italic_p end_ARG. The injectivity of the Skolem function f𝑓fitalic_f enforces that o𝑜oitalic_o has been generated, in both cases, by using the same arguments. Moreover, by injectivity again, we necessarily have D.𝖨𝖽(o¯)=(os,C.𝖨𝖽(p¯),ot)D.\mathsf{Id}(\bar{o})=(o_{\mathrm{s}},C.\mathsf{Id}(\bar{p}),o_{\mathrm{t}})italic_D . sansserif_Id ( over¯ start_ARG italic_o end_ARG ) = ( italic_o start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT , italic_C . sansserif_Id ( over¯ start_ARG italic_p end_ARG ) , italic_o start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT ) for some os,otNsubscript𝑜ssubscript𝑜t𝑁o_{\mathrm{s}},o_{\mathrm{t}}\in Nitalic_o start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT , italic_o start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT ∈ italic_N, with C.Id𝒜x¯k2formulae-sequence𝐶Idsubscriptsuperscript𝒜𝑘2¯𝑥C.\mbox{Id}\in\mathcal{A}^{k-2}_{\bar{x}}italic_C . Id ∈ caligraphic_A start_POSTSUPERSCRIPT italic_k - 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUBSCRIPT. (Note that ossubscript𝑜so_{\mathrm{s}}italic_o start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT and otsubscript𝑜to_{\mathrm{t}}italic_o start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT have been respectively obtained from the source and target rules of S𝑆Sitalic_S for p¯.)\bar{p}.)over¯ start_ARG italic_p end_ARG . ) By definition of the range of the Skolem function f𝑓fitalic_f, ossubscript𝑜so_{\mathrm{s}}italic_o start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT and otsubscript𝑜to_{\mathrm{t}}italic_o start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT belong to 𝒯𝒯\mathcal{T}caligraphic_T; thus, they could not be equal to the first and last values of D.Idformulae-sequence𝐷IdD.\mbox{Id}italic_D . Id. We conclude that NE=𝑁𝐸N\cap E=\emptysetitalic_N ∩ italic_E = ∅.

(iii) Finally, by injectivity of f𝑓fitalic_f, for an o𝒪𝑜𝒪o\in\mathcal{O}italic_o ∈ caligraphic_O which is an edge id in T(G)𝑇𝐺T(G)italic_T ( italic_G ), there are, by definition, exactly one osNsubscript𝑜s𝑁o_{\mathrm{s}}\in Nitalic_o start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ∈ italic_N and one otNsubscript𝑜t𝑁o_{\mathrm{t}}\in Nitalic_o start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT ∈ italic_N which correspond to the source and the target of this edge. ∎

Although Algorithm 1 always returns a valid property graph (Proposition 5.6), property values may depend on the order in which the rules and bindings are considered in Lines 67 and 1112. Hence, the result of the transformation may be ill-defined on some inputs. We investigate this further in the next section.

6. Detecting conflicts

As one would expect from any expressive property graph transformation language, our formalism supports manipulating properties of output nodes and edges. Compared to purely structural mechanisms, such as (10.1145/3584372.3588654), this poses additional challenges.

Example 6.1.

Let us continue Example 5.5 by now considering the two-rule transformation T𝑇Titalic_T presented in Figure 2. The second rule gets split into two nodes rules, one of which is

P(x¯)((a.cityName):𝖢𝗂𝗍𝗒)name=a.cityName,code=a.cityCode.P(\bar{x})\implies\underset{\langle name=a.cityName,\>code=a.cityCode\rangle}{% \left((a.cityName):\mathsf{City}\right)}.italic_P ( over¯ start_ARG italic_x end_ARG ) ⟹ start_UNDERACCENT ⟨ italic_n italic_a italic_m italic_e = italic_a . italic_c italic_i italic_t italic_y italic_N italic_a italic_m italic_e , italic_c italic_o italic_d italic_e = italic_a . italic_c italic_i italic_t italic_y italic_C italic_o italic_d italic_e ⟩ end_UNDERACCENT start_ARG ( ( italic_a . italic_c italic_i italic_t italic_y italic_N italic_a italic_m italic_e ) : sansserif_City ) end_ARG .

Suppose that this node rule is processed in Line 6 after the two node rules discussed in Example 5.5. The algorithm attempts twice to create a node with identifier f(``𝐿𝑢𝑥𝑒𝑚𝑏𝑢𝑟𝑔")𝑓``𝐿𝑢𝑥𝑒𝑚𝑏𝑢𝑟𝑔"f(``\mathit{Luxemburg}")italic_f ( ` ` italic_Luxemburg " ) (Line 8), once for each binding. However, a node with identifier f(``𝐿𝑢𝑥𝑒𝑚𝑏𝑢𝑟𝑔")𝑓``𝐿𝑢𝑥𝑒𝑚𝑏𝑢𝑟𝑔"f(``\mathit{Luxemburg}")italic_f ( ` ` italic_Luxemburg " ) has been already created by the second node rule in Example 5.5. In consequence, the label 𝖢𝗂𝗍𝗒𝖢𝗂𝗍𝗒\mathsf{City}sansserif_City is added to this node (Line 9) and its properties name𝑛𝑎𝑚𝑒nameitalic_n italic_a italic_m italic_e and code𝑐𝑜𝑑𝑒codeitalic_c italic_o italic_d italic_e are set to Luxemburg𝐿𝑢𝑥𝑒𝑚𝑏𝑢𝑟𝑔Luxemburgitalic_L italic_u italic_x italic_e italic_m italic_b italic_u italic_r italic_g and 1457145714571457, respectively (Line 10), overriding previous values Luxemburg𝐿𝑢𝑥𝑒𝑚𝑏𝑢𝑟𝑔Luxemburgitalic_L italic_u italic_x italic_e italic_m italic_b italic_u italic_r italic_g and LUX𝐿𝑈𝑋LUXitalic_L italic_U italic_X. This means that one of the two values of property code𝑐𝑜𝑑𝑒codeitalic_c italic_o italic_d italic_e is lost and it depends on the processing order of rules which one it is. Indeed, the mapping now conflates not only two cities called Luxemburg, as in Example 1.1, but also the country Luxemburg. This time, however, the error is easy to spot: looking at the rules we see immediately that the identity of the output nodes depends exclusively on the name of the city/country, which means that all cities and countries with the same name are conflated. We can fix the transformation easily by including information about the corresponding country in the identity of each 𝖢𝗂𝗍𝗒𝖢𝗂𝗍𝗒\mathsf{City}sansserif_City node, for instance by replacing (a.cityName)formulae-sequence𝑎𝑐𝑖𝑡𝑦𝑁𝑎𝑚𝑒(a.cityName)( italic_a . italic_c italic_i italic_t italic_y italic_N italic_a italic_m italic_e ) with (a.cityName,.countryName)formulae-sequence𝑎𝑐𝑖𝑡𝑦𝑁𝑎𝑚𝑒𝑐𝑜𝑢𝑛𝑡𝑟𝑦𝑁𝑎𝑚𝑒(a.cityName,\ell.countryName)( italic_a . italic_c italic_i italic_t italic_y italic_N italic_a italic_m italic_e , roman_ℓ . italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_N italic_a italic_m italic_e ) in rule (2) in Figure 2. \blacktriangleleft

Detecting the modelling error in the rules in Figure 2 requires human insight (basic understanding of geography) but we hope to make it easier by insisting on explicit identity specification in transformations. On the other hand, setting an output property to conflicting values is something one can try to capture abstractly and detect automatically. This is what we do next. In the reminder of this section we focus on detecting conflicts statically, by analysing a set of transformation rules to check if it can exhibit this pathological behavior on some input. We come back to handling conflicts dynamically in Section 8 and Section 9. Due to the limited space, most proofs are moved to the appendix (TPG-github).

6.1. Consistency

{toappendix}

7. Detecting conflicts

We formally define the notion of conflicts and provide the proofs for the results in Section 6.

7.1. Consistency checking

We now formalize the notion of node conflict. For any k0𝑘0k\geq 0italic_k ≥ 0, let

RP(x¯)(A:L),a=v, and SQ(y¯)(B:M),a=w,𝑅𝑃¯𝑥delimited-⟨⟩formulae-sequence𝑎𝑣:𝐴𝐿 and 𝑆𝑄¯𝑦delimited-⟨⟩formulae-sequence𝑎𝑤:𝐵𝑀R\coloneqq P(\bar{x})\implies\underset{\langle\dots,a=v,\dots\rangle}{(A:L)}% \mbox{ and }S\coloneqq Q(\bar{y})\implies\underset{\langle\dots,a=w,\dots% \rangle}{(B:M)}italic_R ≔ italic_P ( over¯ start_ARG italic_x end_ARG ) ⟹ start_UNDERACCENT ⟨ … , italic_a = italic_v , … ⟩ end_UNDERACCENT start_ARG ( italic_A : italic_L ) end_ARG and italic_S ≔ italic_Q ( over¯ start_ARG italic_y end_ARG ) ⟹ start_UNDERACCENT ⟨ … , italic_a = italic_w , … ⟩ end_UNDERACCENT start_ARG ( italic_B : italic_M ) end_ARG

be two node rules in T𝑇Titalic_T with L,M𝒫fin()𝐿𝑀subscript𝒫𝑓𝑖𝑛L,M\in\mathcal{P}_{fin}(\mathcal{L})italic_L , italic_M ∈ caligraphic_P start_POSTSUBSCRIPT italic_f italic_i italic_n end_POSTSUBSCRIPT ( caligraphic_L ). These two node rules are potentially conflicting on the property a𝑎aitalic_a when:

  • their respective argument lists have same length, i.e., A=(a1,,ak)𝒜x¯k𝐴subscript𝑎1subscript𝑎𝑘subscriptsuperscript𝒜𝑘¯𝑥A=(a_{1},\dots,a_{k})\in\mathcal{A}^{k}_{\bar{x}}italic_A = ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ caligraphic_A start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUBSCRIPT and B=(b1,,bk)𝒜y¯k𝐵subscript𝑏1subscript𝑏𝑘subscriptsuperscript𝒜𝑘¯𝑦B=(b_{1},\dots,b_{k})\in\mathcal{A}^{k}_{\bar{y}}italic_B = ( italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ caligraphic_A start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over¯ start_ARG italic_y end_ARG end_POSTSUBSCRIPT for a k0𝑘0k\geq 0italic_k ≥ 0;

  • their argument lists are compatible, which means that for each 1ik1𝑖𝑘1\leq i\leq k1 ≤ italic_i ≤ italic_k, aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a value argument if and only if bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a value argument;

  • if aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are respectively xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and yjsubscript𝑦𝑗y_{j}italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, then 𝗌𝖼𝗁(P)(xi)𝗌𝖼𝗁𝑃subscript𝑥𝑖\mathsf{sch}(P)(x_{i})sansserif_sch ( italic_P ) ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) should be equal to 𝗌𝖼𝗁(P)(yj)𝗌𝖼𝗁𝑃subscript𝑦𝑗\mathsf{sch}(P)(y_{j})sansserif_sch ( italic_P ) ( italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT );

  • they have potentially conflicting properties, which means that they both define the same property to (possibly) different values, i.e., a𝒦,v𝒱x¯formulae-sequence𝑎𝒦𝑣subscript𝒱¯𝑥a\in\mathcal{K},v\in\mathcal{V}_{\bar{x}}italic_a ∈ caligraphic_K , italic_v ∈ caligraphic_V start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUBSCRIPT and w𝒱y¯𝑤subscript𝒱¯𝑦w\in\mathcal{V}_{\bar{y}}italic_w ∈ caligraphic_V start_POSTSUBSCRIPT over¯ start_ARG italic_y end_ARG end_POSTSUBSCRIPT.

A node conflict for a pair of possibly conflicting rules on the property a𝑎aitalic_a occurs in a property graph G𝐺Gitalic_G whenever it exists o¯PGx¯\bar{o}\in\llbracket P\rrbracket^{\bar{x}}_{G}over¯ start_ARG italic_o end_ARG ∈ ⟦ italic_P ⟧ start_POSTSUPERSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT and p¯QGy¯\bar{p}\in\llbracket Q\rrbracket^{\bar{y}}_{G}over¯ start_ARG italic_p end_ARG ∈ ⟦ italic_Q ⟧ start_POSTSUPERSCRIPT over¯ start_ARG italic_y end_ARG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT with A(o¯)=B(p¯)𝐴¯𝑜𝐵¯𝑝A(\bar{o})=B(\bar{p})italic_A ( over¯ start_ARG italic_o end_ARG ) = italic_B ( over¯ start_ARG italic_p end_ARG ) and v(o¯)w(p¯)𝑣¯𝑜𝑤¯𝑝v(\bar{o})\neq w(\bar{p})italic_v ( over¯ start_ARG italic_o end_ARG ) ≠ italic_w ( over¯ start_ARG italic_p end_ARG ).

We now formalize the notion of edge conflict. For any k0𝑘0k\geq 0italic_k ≥ 0, let

RP(x¯)(As:)[A:L  ],a=v,(At:) and SQ(y¯)(Bs:)[B:M  ],a=w,(Bt:)R\coloneqq P(\bar{x})\implies(A_{\mathrm{s}}:\,)\underset{\langle\dots,a=v,% \dots\rangle}{\left[\leavevmode\hbox to31.65pt{\vbox to19.51pt{\pgfpicture% \makeatletter\hbox{\hskip 15.82317pt\lower-8.31221pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-12.09016pt}{3.08533pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle A\,:\,L$ % }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-15.62317pt}{0.0pt}\pgfsys@lineto{15.16318pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{15.% 16318pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}\right]}(A_{\mathrm{t}}:\,)\mbox{ and }S% \coloneqq Q(\bar{y})\implies(B_{\mathrm{s}}:\,)\underset{\langle\dots,a=w,% \dots\rangle}{\left[\leavevmode\hbox to34.85pt{\vbox to19.51pt{\pgfpicture% \makeatletter\hbox{\hskip 17.42368pt\lower-8.31221pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-13.69067pt}{3.08533pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle B\,:\,M$ % }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-17.22368pt}{0.0pt}\pgfsys@lineto{16.76369pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{16.% 76369pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}\right]}(B_{\mathrm{t}}:\,)italic_R ≔ italic_P ( over¯ start_ARG italic_x end_ARG ) ⟹ ( italic_A start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT : ) start_UNDERACCENT ⟨ … , italic_a = italic_v , … ⟩ end_UNDERACCENT start_ARG [ italic_A : italic_L ] end_ARG ( italic_A start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT : ) and italic_S ≔ italic_Q ( over¯ start_ARG italic_y end_ARG ) ⟹ ( italic_B start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT : ) start_UNDERACCENT ⟨ … , italic_a = italic_w , … ⟩ end_UNDERACCENT start_ARG [ italic_B : italic_M ] end_ARG ( italic_B start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT : )

be two edge rules in T𝑇Titalic_T with L,M𝒫fin()𝐿𝑀subscript𝒫𝑓𝑖𝑛L,M\in\mathcal{P}_{fin}(\mathcal{L})italic_L , italic_M ∈ caligraphic_P start_POSTSUBSCRIPT italic_f italic_i italic_n end_POSTSUBSCRIPT ( caligraphic_L ). These two edge rules are potentially conflicting on the property a𝑎aitalic_a when:

  • their respective argument lists have same length, i.e., A=(a1,,ak)𝒜x¯k𝐴subscript𝑎1subscript𝑎𝑘subscriptsuperscript𝒜𝑘¯𝑥A=(a_{1},\dots,a_{k})\in\mathcal{A}^{k}_{\bar{x}}italic_A = ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ caligraphic_A start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUBSCRIPT and B=(b1,,bk)𝒜y¯k𝐵subscript𝑏1subscript𝑏𝑘subscriptsuperscript𝒜𝑘¯𝑦B=(b_{1},\dots,b_{k})\in\mathcal{A}^{k}_{\bar{y}}italic_B = ( italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ caligraphic_A start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over¯ start_ARG italic_y end_ARG end_POSTSUBSCRIPT for a k0𝑘0k\geq 0italic_k ≥ 0;

  • their argument lists are compatible, which means that for each 1ik1𝑖𝑘1\leq i\leq k1 ≤ italic_i ≤ italic_k, aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a value argument if and only if bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a value argument;

  • if aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are respectively xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and yjsubscript𝑦𝑗y_{j}italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, then 𝗌𝖼𝗁(P)(xi)𝗌𝖼𝗁𝑃subscript𝑥𝑖\mathsf{sch}(P)(x_{i})sansserif_sch ( italic_P ) ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) should be equal to 𝗌𝖼𝗁(P)(yj)𝗌𝖼𝗁𝑃subscript𝑦𝑗\mathsf{sch}(P)(y_{j})sansserif_sch ( italic_P ) ( italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT );

  • the three previous points also apply to the pairs (As,At)subscript𝐴ssubscript𝐴t(A_{\mathrm{s}},A_{\mathrm{t}})( italic_A start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT ) and (Bs,Bt)subscript𝐵ssubscript𝐵t(B_{\mathrm{s}},B_{\mathrm{t}})( italic_B start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT , italic_B start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT );

  • they have potentially conflicting properties, which means that they both define the same property to (possibly) different values, i.e., a𝒦,v𝒱x¯formulae-sequence𝑎𝒦𝑣subscript𝒱¯𝑥a\in\mathcal{K},v\in\mathcal{V}_{\bar{x}}italic_a ∈ caligraphic_K , italic_v ∈ caligraphic_V start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUBSCRIPT and w𝒱y¯𝑤subscript𝒱¯𝑦w\in\mathcal{V}_{\bar{y}}italic_w ∈ caligraphic_V start_POSTSUBSCRIPT over¯ start_ARG italic_y end_ARG end_POSTSUBSCRIPT.

An edge conflict for a pair of possibly conflicting rules on the property a𝑎aitalic_a occurs in a property graph G𝐺Gitalic_G whenever it exists o¯PGx¯\bar{o}\in\llbracket P\rrbracket^{\bar{x}}_{G}over¯ start_ARG italic_o end_ARG ∈ ⟦ italic_P ⟧ start_POSTSUPERSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT and p¯QGy¯\bar{p}\in\llbracket Q\rrbracket^{\bar{y}}_{G}over¯ start_ARG italic_p end_ARG ∈ ⟦ italic_Q ⟧ start_POSTSUPERSCRIPT over¯ start_ARG italic_y end_ARG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT with A(o¯)=B(p¯)𝐴¯𝑜𝐵¯𝑝A(\bar{o})=B(\bar{p})italic_A ( over¯ start_ARG italic_o end_ARG ) = italic_B ( over¯ start_ARG italic_p end_ARG ), As(o¯)=Bs(p¯)subscript𝐴s¯𝑜subscript𝐵s¯𝑝A_{\mathrm{s}}(\bar{o})=B_{\mathrm{s}}(\bar{p})italic_A start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( over¯ start_ARG italic_o end_ARG ) = italic_B start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( over¯ start_ARG italic_p end_ARG ), At(o¯)=Bt(p¯)subscript𝐴t¯𝑜subscript𝐵t¯𝑝A_{\mathrm{t}}(\bar{o})=B_{\mathrm{t}}(\bar{p})italic_A start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT ( over¯ start_ARG italic_o end_ARG ) = italic_B start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT ( over¯ start_ARG italic_p end_ARG ) and v(o¯)w(p¯)𝑣¯𝑜𝑤¯𝑝v(\bar{o})\neq w(\bar{p})italic_v ( over¯ start_ARG italic_o end_ARG ) ≠ italic_w ( over¯ start_ARG italic_p end_ARG ).

By a conflict we mean a situation when Algorithm 1 resets a previously set property to a different value, as illustrated in Example 6.1. A transformation T𝑇Titalic_T is consistent if for every input property graph G𝐺Gitalic_G, no execution of Algorithm 1 results in a conflict. Note that even a transformation consisting of a single rule can be inconsistent, because different bindings for the same rule can cause a conflict.

We study the following fundamental static analysis problem, in the setting where there is no source schema constraining the set of possible input property graphs.

Consistency.:

Given a transformation T𝑇Titalic_T, check if T𝑇Titalic_T is consistent.

As we show next, consistency of transformations is deeply related to satisfiability of GPC patterns. A GPC pattern is satisfiable if it returns a non-empty set of answers on some property graph. Towards the goal of establishing complexity lower bounds for the consistency problem, we provide a polynomial-time reduction from the satisfiability problem for GPC.

Satisfiability.:

Given a GPC pattern P𝑃Pitalic_P, check if P𝑃Pitalic_P is satisfiable.

Lemma 7.1.

The satisfiability problem for GPC is PTime-reducible to the transformation consistency problem.

Proof.

For a GPC pattern P𝑃Pitalic_P, let TPsubscript𝑇𝑃T_{P}italic_T start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT be the transformation consisting of the following two rules

P()((c):)k=5 and P()((c):)k=7formulae-sequence𝑃delimited-⟨⟩𝑘5:𝑐 and 𝑃delimited-⟨⟩𝑘7:𝑐P()\implies\underset{\langle k=5\rangle}{((c):\ell)}\quad\mbox{ and }\quad P()% \implies\underset{\langle k=7\rangle}{((c):\ell)}italic_P ( ) ⟹ start_UNDERACCENT ⟨ italic_k = 5 ⟩ end_UNDERACCENT start_ARG ( ( italic_c ) : roman_ℓ ) end_ARG and italic_P ( ) ⟹ start_UNDERACCENT ⟨ italic_k = 7 ⟩ end_UNDERACCENT start_ARG ( ( italic_c ) : roman_ℓ ) end_ARG

for some fixed label \ellroman_ℓ, constant c𝑐citalic_c, and property name k𝑘kitalic_k. These rules are not conflicting with themselves, because their node constructors do not depend on the binding. However, they are conflicting with each other on a graph G𝐺Gitalic_G if P𝑃Pitalic_P returns at least one answer on G𝐺Gitalic_G. Hence, TPsubscript𝑇𝑃T_{P}italic_T start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT is consistent iff P𝑃Pitalic_P is not satisfiable. ∎

For the converse of Lemma 7.1 to hold we need to move to GPC+, a simple extension of GPC with projection and union (10.1145/3584372.3588662). {lemmarep} The transformation consistency problem is PTime-reducible to the satisfiability problem for GPC+.

{proofsketch}

For a pair of rules R𝑅Ritalic_R and S𝑆Sitalic_S, and an attribute a𝑎aitalic_a, we can write a Boolean GPC+ query QR,S,a()subscript𝑄𝑅𝑆𝑎Q_{R,S,a}()italic_Q start_POSTSUBSCRIPT italic_R , italic_S , italic_a end_POSTSUBSCRIPT ( ) that detects if some matches of R𝑅Ritalic_R and S𝑆Sitalic_S lead to different values for attribute a𝑎aitalic_a in the same element of the output graph. Because there are polynomially many such triples, we can take the union of all such queries to obtain the final GPC+ query to be checked for satisfiability.

Proof.

Recall that a node conflict for a pair of possibly conflicting rules R𝑅Ritalic_R and S𝑆Sitalic_S on a property a𝑎aitalic_a occurs on a property graph G𝐺Gitalic_G whenever it exists o¯PGx¯\bar{o}\in\llbracket P\rrbracket^{\bar{x}}_{G}over¯ start_ARG italic_o end_ARG ∈ ⟦ italic_P ⟧ start_POSTSUPERSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT and p¯QGy¯\bar{p}\in\llbracket Q\rrbracket^{\bar{y}}_{G}over¯ start_ARG italic_p end_ARG ∈ ⟦ italic_Q ⟧ start_POSTSUPERSCRIPT over¯ start_ARG italic_y end_ARG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT with A(o¯)=B(p¯)𝐴¯𝑜𝐵¯𝑝A(\bar{o})=B(\bar{p})italic_A ( over¯ start_ARG italic_o end_ARG ) = italic_B ( over¯ start_ARG italic_p end_ARG ) and v(o¯)w(p¯)𝑣¯𝑜𝑤¯𝑝v(\bar{o})\neq w(\bar{p})italic_v ( over¯ start_ARG italic_o end_ARG ) ≠ italic_w ( over¯ start_ARG italic_p end_ARG ). We can rewrite all of those conditions in a single boolean GPC query:

Q(R,S,a)()P(x¯),Q(y¯),A=B,vwformulae-sequencesubscript𝑄𝑅𝑆𝑎𝑃¯𝑥𝑄¯𝑦formulae-sequence𝐴𝐵𝑣𝑤Q_{(R,S,a)}()\coloneqq P(\bar{x}),Q(\bar{y}),A=B,v\neq witalic_Q start_POSTSUBSCRIPT ( italic_R , italic_S , italic_a ) end_POSTSUBSCRIPT ( ) ≔ italic_P ( over¯ start_ARG italic_x end_ARG ) , italic_Q ( over¯ start_ARG italic_y end_ARG ) , italic_A = italic_B , italic_v ≠ italic_w

which is satisfiable on a property graph H𝐻Hitalic_H iff this specific conflict occurs on H𝐻Hitalic_H. Note that we assume w.l.o.g. in the following construction that x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG and y¯¯𝑦\bar{y}over¯ start_ARG italic_y end_ARG are disjoint sets of variables.

Similarly, for an edge conflict, we obtain a single boolean GPC query with the same properties:

Q(R,S,a)()P(x¯),Q(y¯),A=B,As=Bs,At=Bt,vwformulae-sequencesubscript𝑄𝑅𝑆𝑎𝑃¯𝑥𝑄¯𝑦formulae-sequence𝐴𝐵formulae-sequencesubscript𝐴ssubscript𝐵sformulae-sequencesubscript𝐴tsubscript𝐵t𝑣𝑤Q_{(R,S,a)}()\coloneqq P(\bar{x}),Q(\bar{y}),A=B,A_{\mathrm{s}}=B_{\mathrm{s}}% ,A_{\mathrm{t}}=B_{\mathrm{t}},v\neq witalic_Q start_POSTSUBSCRIPT ( italic_R , italic_S , italic_a ) end_POSTSUBSCRIPT ( ) ≔ italic_P ( over¯ start_ARG italic_x end_ARG ) , italic_Q ( over¯ start_ARG italic_y end_ARG ) , italic_A = italic_B , italic_A start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT = italic_B start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT = italic_B start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT , italic_v ≠ italic_w

We provide an example of the GPC pattern encoding vw𝑣𝑤v\neq witalic_v ≠ italic_w. Let assume that vxi.bformulae-sequence𝑣subscript𝑥𝑖𝑏v\coloneqq x_{i}.bitalic_v ≔ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . italic_b and wyj.cformulae-sequence𝑤subscript𝑦𝑗𝑐w\coloneqq y_{j}.citalic_w ≔ italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT . italic_c and 𝗌𝖼𝗁(P)(xi)=𝖭𝗈𝖽𝖾𝗌𝖼𝗁𝑃subscript𝑥𝑖𝖭𝗈𝖽𝖾\mathsf{sch}(P)(x_{i})=\mathsf{Node}sansserif_sch ( italic_P ) ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = sansserif_Node and 𝗌𝖼𝗁(Q)(yj)=𝖤𝖽𝗀𝖾𝗌𝖼𝗁𝑄subscript𝑦𝑗𝖤𝖽𝗀𝖾\mathsf{sch}(Q)(y_{j})=\mathsf{Edge}sansserif_sch ( italic_Q ) ( italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = sansserif_Edge, the following join query encodes vw𝑣𝑤v\neq witalic_v ≠ italic_w:

[(xi),()yj  ()]¬(xi.b=yj.c).\underset{\langle\neg\left(x_{i}.b=y_{j}.c\right)\rangle}{\left[(x_{i}),\left(% \right)\leavevmode\hbox to15.72pt{\vbox to19.69pt{\pgfpicture\makeatletter% \hbox{\hskip 7.858pt\lower-8.31221pt\hbox to0.0pt{\pgfsys@beginscope% \pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-4.125pt}{5.0268pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\mathit{y_% {j}}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-7.658pt}{0.0pt}\pgfsys@lineto{7.19801pt}{0.0pt% }\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{7.19801pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}\left(\right)\right]}.start_UNDERACCENT ⟨ ¬ ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . italic_b = italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT . italic_c ) ⟩ end_UNDERACCENT start_ARG [ ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , ( ) italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( ) ] end_ARG .

Similarly, we can apply point-wise this construction and take their join to encode A=B,As=Bsformulae-sequence𝐴𝐵subscript𝐴ssubscript𝐵sA=B,A_{\mathrm{s}}=B_{\mathrm{s}}italic_A = italic_B , italic_A start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT = italic_B start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT and At=Btsubscript𝐴tsubscript𝐵tA_{\mathrm{t}}=B_{\mathrm{t}}italic_A start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT = italic_B start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT.

Finally, to wrap-up the proof, it is easy to see that given a property graph transformation T𝑇Titalic_T, there are at most polynomially many such triplets (R,S,a)𝑅𝑆𝑎(R,S,a)( italic_R , italic_S , italic_a ) satisfying this criteria, so we can take the union of all the Q(R,S,a)()subscript𝑄𝑅𝑆𝑎Q_{(R,S,a)}()italic_Q start_POSTSUBSCRIPT ( italic_R , italic_S , italic_a ) end_POSTSUBSCRIPT ( ) as the final GPC+ query on which to check for satisfiability. ∎

We now turn to study the complexity of the satisfiability problem for GPC and GPC+. The two lemmas above will allow us to draw conclusions for the consistency problem in Section 7.4.

7.2. The complexity of satisfiability

{toappendix}

7.3. GPC satisfiability

In Theorem 7.3, we establish that checking if a GPC+ query is satisfiable is a PSpace-complete problem (modulo certain assumptions on the use of restrictors). We believe that this result is interesting in its own right, beyond the application to transformation consistency we consider this paper. Indeed, deciding whether a query expressed in a given query language is satisfiable is a fundamental problem in database theory. Very little is known to this date about GPC from a theoretical viewpoint, and our work is one of the first to tackle a key static analysis task related to this query language.

{lemmarep}

The satisfiability problem for GPC is PSpace-hard.

{proofsketch}

We show how to reduce the membership problem for an arbitrary PSpace language to the satisfiability of a GPC query. Let L𝐿Litalic_L be a language in PSpace and M𝑀Mitalic_M a deterministic polynomial-space Turing machine that recognizes L𝐿Litalic_L in space cp(n)𝑐𝑝𝑛c\cdot p(n)italic_c ⋅ italic_p ( italic_n ) for a fixed constant c𝑐citalic_c and polynomial p𝑝pitalic_p. In the following, n𝑛nitalic_n denotes the length of the word w𝑤witalic_w which is an input to M𝑀Mitalic_M.

We construct the following GPC pattern P𝑃Pitalic_P:

P()ρ(x)θ1([(u)  (v)]θ2)1..(y)θ3P()\coloneqq\rho\>(\mathit{x})_{\langle\theta_{1}\rangle}\left(\left[(\mathit{% u})\,\leavevmode\hbox to10.8pt{\vbox to17.12pt{\pgfpicture\makeatletter\hbox{% \hskip 5.39966pt\lower-8.31221pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke% { }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}% \pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{\pgfsys@beginscope% \pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{5.47699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-5.19966pt}{0.0pt}\pgfsys@lineto{4.73967pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{4.73967pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}\,(\mathit{v})\right]_{\langle\theta_{2}% \rangle}\right)^{1..\infty}(\mathit{y})_{\langle\theta_{3}\rangle}\\ italic_P ( ) ≔ italic_ρ ( italic_x ) start_POSTSUBSCRIPT ⟨ italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ end_POSTSUBSCRIPT ( [ ( italic_u ) ( italic_v ) ] start_POSTSUBSCRIPT ⟨ italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 . . ∞ end_POSTSUPERSCRIPT ( italic_y ) start_POSTSUBSCRIPT ⟨ italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ⟩ end_POSTSUBSCRIPT

The intuition is the following. We can represent a configuration of M𝑀Mitalic_M in a single node, using a polynomial number of properties. The pattern (x)θ1subscript𝑥delimited-⟨⟩subscript𝜃1(\mathit{x})_{\langle\theta_{1}\rangle}( italic_x ) start_POSTSUBSCRIPT ⟨ italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ end_POSTSUBSCRIPT is responsible for encoding the initial configuration of M𝑀Mitalic_M over the input word w𝑤witalic_w. The pattern [(u)  (v)]θ2subscriptdelimited-[]𝑢  𝑣delimited-⟨⟩subscript𝜃2\left[(\mathit{u})\,\leavevmode\hbox to10.8pt{\vbox to17.12pt{\pgfpicture% \makeatletter\hbox{\hskip 5.39966pt\lower-8.31221pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{5.47699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-5.19966pt}{0.0pt}\pgfsys@lineto{4.73967pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{4.73967pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}\,(\mathit{v})\right]_{\langle\theta_{2}\rangle}[ ( italic_u ) ( italic_v ) ] start_POSTSUBSCRIPT ⟨ italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ end_POSTSUBSCRIPT ensures that there exists a valid transition of M𝑀Mitalic_M between the configurations represented by nodes u𝑢uitalic_u and v𝑣vitalic_v. Finally, (y)θ3subscript𝑦delimited-⟨⟩subscript𝜃3(\mathit{y})_{\langle\theta_{3}\rangle}( italic_y ) start_POSTSUBSCRIPT ⟨ italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ⟩ end_POSTSUBSCRIPT specifies that node y𝑦yitalic_y represents an accepting configuration.

We can use techniques similar to the proof of the Cook-Levin Theorem (10.5555/574848) to construct in time polynomial in n𝑛nitalic_n the formulæ θ1subscript𝜃1\theta_{1}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, θ2subscript𝜃2\theta_{2}italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and θ3subscript𝜃3\theta_{3}italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. The size of P𝑃Pitalic_P is then clearly polynomial in n𝑛nitalic_n. This reduction works with any ρ{𝗌𝗁𝗈𝗋𝗍𝖾𝗌𝗍,𝗌𝗂𝗆𝗉𝗅𝖾,𝗍𝗋𝖺𝗂𝗅}𝜌𝗌𝗁𝗈𝗋𝗍𝖾𝗌𝗍𝗌𝗂𝗆𝗉𝗅𝖾𝗍𝗋𝖺𝗂𝗅\rho\in\{\mathsf{shortest},\mathsf{simple},\mathsf{trail}\}italic_ρ ∈ { sansserif_shortest , sansserif_simple , sansserif_trail }.

Proof.

Let M=(Q,Σ,s,F,δ)𝑀𝑄Σ𝑠𝐹𝛿M=(Q,\Sigma,s,F,\delta)italic_M = ( italic_Q , roman_Σ , italic_s , italic_F , italic_δ ) be the TM that recognizes L𝐿Litalic_L in deterministic polynomial-space. Let w𝑤witalic_w be an input word of length n𝑛nitalic_n. Assume that M𝑀Mitalic_M works over w𝑤witalic_w using at most cp(n)𝑐𝑝𝑛c\cdot p(n)italic_c ⋅ italic_p ( italic_n ) for a fixed constant c𝑐citalic_c and polynomial p𝑝pitalic_p tape cells.

We build a GPC query using the following set of properties 𝒦0𝒦subscript𝒦0𝒦\mathcal{K}_{0}\subset\mathcal{K}caligraphic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⊂ caligraphic_K which contains all the following elements:

  • pos(i,σ)𝑝𝑜subscript𝑠𝑖𝜎pos_{(i,\sigma)}italic_p italic_o italic_s start_POSTSUBSCRIPT ( italic_i , italic_σ ) end_POSTSUBSCRIPT; the tape contains symbol σΣ𝜎Σ\sigma\in\Sigmaitalic_σ ∈ roman_Σ at position 1icp(n)1𝑖𝑐𝑝𝑛1\leq i\leq c\cdot p(n)1 ≤ italic_i ≤ italic_c ⋅ italic_p ( italic_n );

  • head(i)𝑒𝑎subscript𝑑𝑖head_{(i)}italic_h italic_e italic_a italic_d start_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT; the head of the TM is in position 1icp(n)1𝑖𝑐𝑝𝑛1\leq i\leq c\cdot p(n)1 ≤ italic_i ≤ italic_c ⋅ italic_p ( italic_n );

  • q𝑞qitalic_q; the TM is in state qQ𝑞𝑄q\in Qitalic_q ∈ italic_Q.

Notice that we will be using only two constants values: 00 and 1111.

To encode the consistency of a state represented by the set of properties of an element x𝑥xitalic_x, we will use formula θc(x)subscript𝜃𝑐𝑥\theta_{c}(x)italic_θ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x ) defined as the conjunction of the following formulas:

  • ¬(x.pos(i,a)=1)¬(x.pos(i,b)=1)\neg\left(x.pos_{(i,a)}=1\right)\vee\neg\left(x.pos_{(i,b)}=1\right)¬ ( italic_x . italic_p italic_o italic_s start_POSTSUBSCRIPT ( italic_i , italic_a ) end_POSTSUBSCRIPT = 1 ) ∨ ¬ ( italic_x . italic_p italic_o italic_s start_POSTSUBSCRIPT ( italic_i , italic_b ) end_POSTSUBSCRIPT = 1 ); for 1icp(n)1𝑖𝑐𝑝𝑛1\leq i\leq c\cdot p(n)1 ≤ italic_i ≤ italic_c ⋅ italic_p ( italic_n ), a,bΣ𝑎𝑏Σa,b\in\Sigmaitalic_a , italic_b ∈ roman_Σ, ab𝑎𝑏a\neq bitalic_a ≠ italic_b;

  • ¬(x.head(i)=1)¬(x.head(j)=1)\neg\left(x.head_{(i)}=1\right)\vee\neg\left(x.head_{(j)}=1\right)¬ ( italic_x . italic_h italic_e italic_a italic_d start_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT = 1 ) ∨ ¬ ( italic_x . italic_h italic_e italic_a italic_d start_POSTSUBSCRIPT ( italic_j ) end_POSTSUBSCRIPT = 1 ); for 1ijcp(n)1𝑖𝑗𝑐𝑝𝑛1\leq i\neq j\leq c\cdot p(n)1 ≤ italic_i ≠ italic_j ≤ italic_c ⋅ italic_p ( italic_n );

  • ¬(x.q=1)¬(x.q=1)\neg\left(x.q=1\right)\vee\neg\left(x.q^{\prime}=1\right)¬ ( italic_x . italic_q = 1 ) ∨ ¬ ( italic_x . italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 ); for q,qQ,qqformulae-sequence𝑞superscript𝑞𝑄𝑞superscript𝑞q,q^{\prime}\in Q,q\neq q^{\prime}italic_q , italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_Q , italic_q ≠ italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT

  • (x.k=1)(x.k=0)\left(x.k=1\right)\vee\left(x.k=0\right)( italic_x . italic_k = 1 ) ∨ ( italic_x . italic_k = 0 ); for k𝒦0𝑘subscript𝒦0k\in\mathcal{K}_{0}italic_k ∈ caligraphic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

θ1(x)subscript𝜃1𝑥\theta_{1}(x)italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) is a conjunction of the following formulas; it ensures that the set of properties of the element pointed to by x𝑥xitalic_x encodes the initial configuration of the TM M𝑀Mitalic_M over w𝑤witalic_w:

  • x.pos(i,a)=1formulae-sequence𝑥𝑝𝑜subscript𝑠𝑖𝑎1x.pos_{(i,a)}=1italic_x . italic_p italic_o italic_s start_POSTSUBSCRIPT ( italic_i , italic_a ) end_POSTSUBSCRIPT = 1; if wi=asubscript𝑤𝑖𝑎w_{i}=aitalic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_a, for 1in1𝑖𝑛1\leq i\leq n1 ≤ italic_i ≤ italic_n; w𝑤witalic_w is stored at the beginning of the tape;

  • x.pos(i,)=1formulae-sequence𝑥𝑝𝑜subscript𝑠𝑖1x.pos_{(i,\square)}=1italic_x . italic_p italic_o italic_s start_POSTSUBSCRIPT ( italic_i , □ ) end_POSTSUBSCRIPT = 1; for n<icp(n)𝑛𝑖𝑐𝑝𝑛n<i\leq c\cdot p(n)italic_n < italic_i ≤ italic_c ⋅ italic_p ( italic_n ), where \square is the blank symbol; the rest of the tape is filled with blank symbols;

  • (x.head(1)=1)(x.s=1)\left(x.head_{(1)}=1\right)\wedge\left(x.s=1\right)( italic_x . italic_h italic_e italic_a italic_d start_POSTSUBSCRIPT ( 1 ) end_POSTSUBSCRIPT = 1 ) ∧ ( italic_x . italic_s = 1 ); initialisation of both head and state;

  • θc(x)subscript𝜃𝑐𝑥\theta_{c}(x)italic_θ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x ); consistency check.

θ2(u,v)subscript𝜃2𝑢𝑣\theta_{2}(u,v)italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_u , italic_v ) checks that the configuration stored in the record of v𝑣vitalic_v can be obtained from u𝑢uitalic_u in a single computation step of M𝑀Mitalic_M; it consists in the conjunction of the following formulas:

  • (u.head(i)=0u.pos(i,a)=1)v.pos(i,a)=1\left(u.head_{(i)}=0\wedge u.pos_{(i,a)}=1\right)\implies v.pos_{(i,a)}=1( italic_u . italic_h italic_e italic_a italic_d start_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT = 0 ∧ italic_u . italic_p italic_o italic_s start_POSTSUBSCRIPT ( italic_i , italic_a ) end_POSTSUBSCRIPT = 1 ) ⟹ italic_v . italic_p italic_o italic_s start_POSTSUBSCRIPT ( italic_i , italic_a ) end_POSTSUBSCRIPT = 1; for 1icp(n),1𝑖𝑐𝑝𝑛1\leq i\leq c\cdot p(n),1 ≤ italic_i ≤ italic_c ⋅ italic_p ( italic_n ) , aΣ𝑎Σa\in\Sigmaitalic_a ∈ roman_Σ; the tape remains unchanged unless written by head;

  • (u.head(i)=1u.pos(i,a)=1u.q=1)(v.pos(i,b)=1v.head(i+d)=1v.q=1)\left(u.head_{(i)}=1\wedge u.pos_{(i,a)}=1\wedge u.q=1\right)\implies\left(v.% pos_{(i,b)}=1\wedge v.head_{(i+d)}=1\wedge v.q^{\prime}=1\right)( italic_u . italic_h italic_e italic_a italic_d start_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT = 1 ∧ italic_u . italic_p italic_o italic_s start_POSTSUBSCRIPT ( italic_i , italic_a ) end_POSTSUBSCRIPT = 1 ∧ italic_u . italic_q = 1 ) ⟹ ( italic_v . italic_p italic_o italic_s start_POSTSUBSCRIPT ( italic_i , italic_b ) end_POSTSUBSCRIPT = 1 ∧ italic_v . italic_h italic_e italic_a italic_d start_POSTSUBSCRIPT ( italic_i + italic_d ) end_POSTSUBSCRIPT = 1 ∧ italic_v . italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 ); for 1icp(n),1𝑖𝑐𝑝𝑛1\leq i\leq c\cdot p(n),1 ≤ italic_i ≤ italic_c ⋅ italic_p ( italic_n ) , a,bΣ𝑎𝑏Σa,b\in\Sigmaitalic_a , italic_b ∈ roman_Σ, q,qQ𝑞superscript𝑞𝑄q,q^{\prime}\in Qitalic_q , italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_Q, d{1,0,1}𝑑101d\in\{-1,0,1\}italic_d ∈ { - 1 , 0 , 1 }, when (q,b,d)δ(q,a)superscript𝑞𝑏𝑑𝛿𝑞𝑎(q^{\prime},b,d)\in\delta(q,a)( italic_q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_b , italic_d ) ∈ italic_δ ( italic_q , italic_a ); there is a valid transition;

  • θc(u)θc(v)subscript𝜃𝑐𝑢subscript𝜃𝑐𝑣\theta_{c}(u)\wedge\theta_{c}(v)italic_θ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_u ) ∧ italic_θ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_v ); consistency checks.

Finally, θ3subscript𝜃3\theta_{3}italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT checks that we have reached an accepting configuration:

θ3(y)qFy.q=1formulae-sequencesubscript𝜃3𝑦subscript𝑞𝐹𝑦𝑞1\theta_{3}(y)\coloneqq\bigvee\limits_{q\in F}y.q=1italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_y ) ≔ ⋁ start_POSTSUBSCRIPT italic_q ∈ italic_F end_POSTSUBSCRIPT italic_y . italic_q = 1

It is clear that the pattern P𝑃Pitalic_P – which is constructed in polynomial time given an entry word w𝑤witalic_w – is satisfiable if and only if there exists an accepting run for M𝑀Mitalic_M over w𝑤witalic_w using at most a polynomial amount of space, i.e., iff wLPSpace𝑤𝐿PSpacew\in L\in\textsc{PSpace}italic_w ∈ italic_L ∈ PSpace. ∎

For completeness, we also provide the matching upper bound (under some assumptions) and obtain Theorem 7.3 as a result. The details of the upper-bound proof are highly technical and the claim depends heavily on the design choices made for GPC.

{theoremrep}

The satisfiability problem for GPC+ queries using only the 𝗌𝗂𝗆𝗉𝗅𝖾𝗌𝗂𝗆𝗉𝗅𝖾\mathsf{simple}sansserif_simple and 𝗍𝗋𝖺𝗂𝗅𝗍𝗋𝖺𝗂𝗅\mathsf{trail}sansserif_trail restrictors is PSpace-complete.

Proof.

By Lemma 7.3, it is only left to prove the upper-bound. Let Q𝑄Qitalic_Q be a GPC query. We assume a mild syntactic restriction that, all edge and node patterns must mention a variable in their descriptor. This can be achieved by picking a fresh variable when none is specified in a descriptor.

We note respectively 𝖢𝗈𝗇𝗌𝗍0subscript𝖢𝗈𝗇𝗌𝗍0\mathsf{Const}_{0}sansserif_Const start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, 𝒦0subscript𝒦0\mathcal{K}_{0}caligraphic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 0subscript0\mathcal{L}_{0}caligraphic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, the set of constants, keys and labels mentioned in Q𝑄Qitalic_Q. Additionally, let d𝑑ditalic_d be the number of occurences of conditions of the form x.a=cformulae-sequence𝑥𝑎𝑐x.a=citalic_x . italic_a = italic_c or x.a=y.bformulae-sequence𝑥𝑎𝑦𝑏x.a=y.bitalic_x . italic_a = italic_y . italic_b in the formula. We extend the set 𝖢𝗈𝗇𝗌𝗍0subscript𝖢𝗈𝗇𝗌𝗍0\mathsf{Const}_{0}sansserif_Const start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT with d+1𝑑1d+1italic_d + 1 fresh distinct constants. Let also 𝒳𝒳\mathcal{X}caligraphic_X and 𝒴𝒴\mathcal{Y}caligraphic_Y be respectively the sets of variables of type 𝖭𝗈𝖽𝖾𝖭𝗈𝖽𝖾\mathsf{Node}sansserif_Node and 𝖤𝖽𝗀𝖾𝖤𝖽𝗀𝖾\mathsf{Edge}sansserif_Edge in the schema of Q𝑄Qitalic_Q.

Preliminary remarks.

We start with some key observations:

  • It is clear that we cannot always guess an answer made up of a path and an assignment because some patterns are only satisfiable by paths of exponential length: e.g., we can simulate a counter to count up to 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with n𝑛nitalic_n properties and a polynomial-sized formula similar to that used in Lemma 7.3;

  • The concatenation of patterns implies an implicit equality over the endpoints; thus, we need to book-keep the endpoints of a pattern alongside an assignment for it;

  • The semantics and the typing rules of GPC isolate the variables under a repetition pattern; this means that, in a repetition pattern, we cannot refer to externally defined variables (i.e., non-local occurrences are not permitted); moreover, because conditions are only defined over singleton variables (i.e., variables of type 𝖭𝗈𝖽𝖾𝖭𝗈𝖽𝖾\mathsf{Node}sansserif_Node or of type 𝖤𝖽𝗀𝖾𝖤𝖽𝗀𝖾\mathsf{Edge}sansserif_Edge in the schema of Q𝑄Qitalic_Q), we cannot refer in conditions to variables appearing under a repetition sub-pattern. Thus, we can assume that, if a query is satisfiable, an answer path for a pattern inside a repetition pattern can be disjoint (except for its endpoints) from the answer paths for the outside of the repetition pattern.

This shows that we can store in polynomial space an extended assignment corresponding to a valid answer to a path pattern π𝜋\piitalic_π – by extended, we mean that we additionally store the two endpoints of the answer path in variables named 𝗌𝗋𝖼πsubscript𝗌𝗋𝖼𝜋\mathsf{src}_{\pi}sansserif_src start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT and 𝗍𝗀𝗍πsubscript𝗍𝗀𝗍𝜋\mathsf{tgt}_{\pi}sansserif_tgt start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT (that we assume to belong to 𝒳𝒳\mathcal{X}caligraphic_X); additionally, 𝖦𝗋𝗈𝗎𝗉𝖦𝗋𝗈𝗎𝗉\mathsf{Group}sansserif_Group and 𝖬𝖺𝗒𝖻𝖾𝖬𝖺𝗒𝖻𝖾\mathsf{Maybe}sansserif_Maybe variables will not be tracked because they can no longer be mentioned in conditions. In this case, an assignment η()𝜂\eta(\cdot)italic_η ( ⋅ ) stores for each variable the description of an element which is made of its label set (referred to by λ(η())𝜆𝜂\lambda(\eta(\cdot))italic_λ ( italic_η ( ⋅ ) )) and its record (referred to by δ(η(),)𝛿𝜂\delta(\eta(\cdot),\cdot)italic_δ ( italic_η ( ⋅ ) , ⋅ )); it does not contain an identifier for this element.

Saturation procedure and consistency check.

In the body of the main algorithm we make use of a procedure called saturation. The main idea of this procedure is to propagate equality and inequality constraints between variables whenever new ones are found. We illustrate why this procedure is needed and how it works on the following pattern:

(u)u.a=1z  (v)z  (w)w.a=2subscript𝑢delimited-⟨⟩formulae-sequence𝑢𝑎1𝑧  𝑣𝑧  subscript𝑤delimited-⟨⟩formulae-sequence𝑤𝑎2(\mathit{u})_{\langle u.a=1\rangle}\leavevmode\hbox to13.91pt{\vbox to18.63pt{% \pgfpicture\makeatletter\hbox{\hskip 6.95522pt\lower-8.31221pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-3.22221pt}{3.97005pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\mathit{z}% $ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-6.75522pt}{0.0pt}\pgfsys@lineto{6.29523pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{6.29523pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}(\mathit{v})\leavevmode\hbox to13.91pt{\vbox to% 18.63pt{\pgfpicture\makeatletter\hbox{\hskip 6.95522pt\lower-8.31221pt\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill% {0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }% \nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{% {\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-3.22221pt}{3.97005pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\mathit{z}% $ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-6.75522pt}{0.0pt}\pgfsys@lineto{6.29523pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{6.29523pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}(\mathit{w})_{\langle w.a=2\rangle}( italic_u ) start_POSTSUBSCRIPT ⟨ italic_u . italic_a = 1 ⟩ end_POSTSUBSCRIPT italic_z ( italic_v ) italic_z ( italic_w ) start_POSTSUBSCRIPT ⟨ italic_w . italic_a = 2 ⟩ end_POSTSUBSCRIPT

Because both occurrences of z𝑧zitalic_z must map to the same edge, the constraints over the endpoints of z𝑧zitalic_z enforce equalities between the variables u𝑢uitalic_u, v𝑣vitalic_v and w𝑤witalic_w. Successively, these equalities imply that u.a=w.aformulae-sequence𝑢𝑎𝑤𝑎u.a=w.aitalic_u . italic_a = italic_w . italic_a, which is in conflict with the requirements of the pattern. Note that if we remove the conditions u.a=1formulae-sequence𝑢𝑎1u.a=1italic_u . italic_a = 1 and w.a=2formulae-sequence𝑤𝑎2w.a=2italic_w . italic_a = 2, this pattern remains unsatisfiable w.r.t. the 𝗌𝗂𝗆𝗉𝗅𝖾𝗌𝗂𝗆𝗉𝗅𝖾\mathsf{simple}sansserif_simple semantics; the forced repetition of nodes in bindings of this pattern was not explicit before applying the saturation procedure because no variable was reused. However, reusing the variable z𝑧zitalic_z makes the pattern explicitely unsatisfiable under 𝗍𝗋𝖺𝗂𝗅𝗍𝗋𝖺𝗂𝗅\mathsf{trail}sansserif_trail semantics.

To formalize this, we introduce the notion of an equality graph G𝐺Gitalic_G for a query Q𝑄Qitalic_Q (or a pattern π𝜋\piitalic_π), which is a 2222-layer undirected edge-labeled graph with nodes in each layer that respectively belong to the sets 𝒴𝒴\mathcal{Y}caligraphic_Y and 𝒳𝒳\mathcal{X}caligraphic_X. There are two edge labels, === and \neq. Edges can only connect nodes from the same layer. If G𝐺Gitalic_G is an equality graph for a pattern π𝜋\piitalic_π, there are two distinguished nodes 𝗌𝗋𝖼π,𝗍𝗀𝗍π𝒳subscript𝗌𝗋𝖼𝜋subscript𝗍𝗀𝗍𝜋𝒳\mathsf{src}_{\pi},\mathsf{tgt}_{\pi}\in\mathcal{X}sansserif_src start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT , sansserif_tgt start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ∈ caligraphic_X which are respectively called the source and the target, and abbreviated 𝗌𝗋𝖼,𝗍𝗀𝗍𝗌𝗋𝖼𝗍𝗀𝗍\mathsf{src},\mathsf{tgt}sansserif_src , sansserif_tgt if clear from the context. This structure supports the following set of operations:

Saturation::

The first step of the saturation procedure over a pattern π𝜋\piitalic_π is to equate the endpoints of edge variables. Let e𝑒eitalic_e be an edge variable in G𝐺Gitalic_G, if G𝐺Gitalic_G contains say x𝑥xitalic_x and y𝑦yitalic_y, two distinct node variables, which are both at the source or the target of an occurrence of e𝑒eitalic_e in π𝜋\piitalic_π, then add x=y𝑥𝑦x=yitalic_x = italic_y in G𝐺Gitalic_G.

In a second step, it performs the transitive closure on the ===-edges of the graph at layer 𝒳𝒳\mathcal{X}caligraphic_X.

After obtaining a ===-transitive graph, we pass on the inequalities: if there is an \neq-edge between say x𝑥xitalic_x and y𝑦yitalic_y, and if both x=xsuperscript𝑥𝑥x^{\prime}=xitalic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x and y=ysuperscript𝑦𝑦y^{\prime}=yitalic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_y in G𝐺Gitalic_G, then we add xysuperscript𝑥superscript𝑦x^{\prime}\neq y^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in G𝐺Gitalic_G.

Finally, it does a backward step to pull-up the inequalities: if there is an \neq-edge between two elements in 𝒳𝒳\mathcal{X}caligraphic_X and if both are either the source or the target of some edges in Q𝑄Qitalic_Q, then it adds an inequality edge between these two edge variables.

Check consistency::

Given an assignment for all the variables mentioned in the equality graph, check if all equalities are satisfied in this assignment: if x𝑥xitalic_x and y𝑦yitalic_y are two variables with x=y𝑥𝑦x=yitalic_x = italic_y in G𝐺Gitalic_G, check if their assignments are strictly the same. Moreover, check if there is no conflict in the graph: a conflict is when there are both an ===-edge and an \neq-edge between the same pair of nodes of G𝐺Gitalic_G; or if there is a \neq-loop.

Merge::

Given two equality graphs G1subscript𝐺1G_{1}italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and G2subscript𝐺2G_{2}italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, merging both on nodes x𝑥xitalic_x and y𝑦yitalic_y for policy ρ𝜌\rhoitalic_ρ consists in:

  1. (1):

    taking the union of their vertices and edges;

  2. (2):

    adding an ===-edge between x𝑥xitalic_x and y𝑦yitalic_y; if x𝑥xitalic_x and y𝑦yitalic_y are given as parameters;

  3. (3):

    if ρ=𝗌𝗂𝗆𝗉𝗅𝖾𝜌𝗌𝗂𝗆𝗉𝗅𝖾\rho=\mathsf{simple}italic_ρ = sansserif_simple, adding an \neq edge from each node in the layer 𝒳𝒳\mathcal{X}caligraphic_X of G1subscript𝐺1G_{1}italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT not ===-connected to 𝗍𝗀𝗍𝗍𝗀𝗍\mathsf{tgt}sansserif_tgt, to each node of the layer 𝒳𝒳\mathcal{X}caligraphic_X of G2subscript𝐺2G_{2}italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT not ===-connected to 𝗌𝗋𝖼𝗌𝗋𝖼\mathsf{src}sansserif_src; (This is consistent with the semantics of the concatenation of ρ1ρ2subscript𝜌1subscript𝜌2\rho_{1}\cdot\rho_{2}italic_ρ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_ρ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, where the target or ρ1subscript𝜌1\rho_{1}italic_ρ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT must be equal to the source of ρ2subscript𝜌2\rho_{2}italic_ρ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.);

  4. (4):

    if ρ=𝗍𝗋𝖺𝗂𝗅𝜌𝗍𝗋𝖺𝗂𝗅\rho=\mathsf{trail}italic_ρ = sansserif_trail, adding an \neq edge from each node of the layer 𝒴𝒴\mathcal{Y}caligraphic_Y of G1subscript𝐺1G_{1}italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, to each node of the layer 𝒴𝒴\mathcal{Y}caligraphic_Y of G2subscript𝐺2G_{2}italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT;

  5. (5):

    applying the saturation procedure.

Note that the result of this operation is also a valid equality graph and that x,y𝑥𝑦x,yitalic_x , italic_y and ρ𝜌\rhoitalic_ρ are optional parameters.

The maximum number of nodes and edges in an equality graph for Q𝑄Qitalic_Q is polynomial in Q𝑄Qitalic_Q; and the three procedures can be implemented in polynomial time.

Inductive procedure for patterns.

In the following, we describe a non-deterministic polynomial space procedure to check if a GPC pattern query ρπ𝜌𝜋\rho\piitalic_ρ italic_π is satisfiable. This inductive procedure over the structure of π𝜋\piitalic_π succeeds if and only if π𝜋\piitalic_π is satisfiable under policy ρ𝜌\rhoitalic_ρ. It returns a pair consisting of an extended assignment and an equality graph over the variables in the assignment:

  • Case π(x:)\pi\coloneqq(\mathit{x}:\mathsf{\ell})italic_π ≔ ( italic_x : roman_ℓ ). Guess a node element with a label set L0𝐿subscript0L\subseteq\mathcal{L}_{0}italic_L ⊆ caligraphic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT s.t. L𝐿\ell\in Lroman_ℓ ∈ italic_L and a record consisting in a partial assignment from the keys in 𝒦0subscript𝒦0\mathcal{K}_{0}caligraphic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to 𝖢𝗈𝗇𝗌𝗍0subscript𝖢𝗈𝗇𝗌𝗍0\mathsf{Const}_{0}sansserif_Const start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Return the pair consisting of the extended assignment binding 𝗌𝗋𝖼πsubscript𝗌𝗋𝖼𝜋\mathsf{src}_{\pi}sansserif_src start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT, 𝗍𝗀𝗍πsubscript𝗍𝗀𝗍𝜋\mathsf{tgt}_{\pi}sansserif_tgt start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT and x𝑥xitalic_x to this node; and of the equality graph containing three nodes: 𝗌𝗋𝖼πsubscript𝗌𝗋𝖼𝜋\mathsf{src}_{\pi}sansserif_src start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT, 𝗍𝗀𝗍πsubscript𝗍𝗀𝗍𝜋\mathsf{tgt}_{\pi}sansserif_tgt start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT and x𝑥xitalic_x, and two edges: 𝗌𝗋𝖼π=xsubscript𝗌𝗋𝖼𝜋𝑥\mathsf{src}_{\pi}=xsansserif_src start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT = italic_x and 𝗍𝗀𝗍π=xsubscript𝗍𝗀𝗍𝜋𝑥\mathsf{tgt}_{\pi}=xsansserif_tgt start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT = italic_x.

  • Case πy:  𝜋:𝑦  \pi\coloneqq\leavevmode\hbox to28.24pt{\vbox to20.23pt{\pgfpicture% \makeatletter\hbox{\hskip 14.1218pt\lower-8.31221pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-10.3888pt}{3.72699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\mathit{y}% \,:\,\mathsf{\ell}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-13.9218pt}{0.0pt}\pgfsys@lineto{13.4618pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.4618pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}italic_π ≔ italic_y : roman_ℓ. Guess an edge element with a label set L0𝐿subscript0L\subseteq\mathcal{L}_{0}italic_L ⊆ caligraphic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT s.t. L𝐿\ell\in Lroman_ℓ ∈ italic_L and a record consisting in a partial assignment from the keys in 𝒦0subscript𝒦0\mathcal{K}_{0}caligraphic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to 𝖢𝗈𝗇𝗌𝗍0subscript𝖢𝗈𝗇𝗌𝗍0\mathsf{Const}_{0}sansserif_Const start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Return one of the following two possibilities:

    • Return a pair consisting of the extended assignment binding y𝑦yitalic_y to this edge, and 𝗌𝗋𝖼πsubscript𝗌𝗋𝖼𝜋\mathsf{src}_{\pi}sansserif_src start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT and 𝗍𝗀𝗍πsubscript𝗍𝗀𝗍𝜋\mathsf{tgt}_{\pi}sansserif_tgt start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT to two arbitrarily guessed endpoint nodes (which act as the source and target of y𝑦yitalic_y); the equality graph contains these three elements and an \neq-edge between 𝗌𝗋𝖼πsubscript𝗌𝗋𝖼𝜋\mathsf{src}_{\pi}sansserif_src start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT and 𝗍𝗀𝗍πsubscript𝗍𝗀𝗍𝜋\mathsf{tgt}_{\pi}sansserif_tgt start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT.

    • (Loop; if ρ𝜌\rhoitalic_ρ is not 𝗌𝗂𝗆𝗉𝗅𝖾𝗌𝗂𝗆𝗉𝗅𝖾\mathsf{simple}sansserif_simple) Return a pair consisting of the extended assignment binding this edge to y𝑦yitalic_y and 𝗌𝗋𝖼πsubscript𝗌𝗋𝖼𝜋\mathsf{src}_{\pi}sansserif_src start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT and 𝗍𝗀𝗍πsubscript𝗍𝗀𝗍𝜋\mathsf{tgt}_{\pi}sansserif_tgt start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT to the same arbitrarily guessed endpoint node; the equality graph contains these three elements and an ===-edge between 𝗌𝗋𝖼πsubscript𝗌𝗋𝖼𝜋\mathsf{src}_{\pi}sansserif_src start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT and 𝗍𝗀𝗍πsubscript𝗍𝗀𝗍𝜋\mathsf{tgt}_{\pi}sansserif_tgt start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT.

  • Case ππ1+π2𝜋subscript𝜋1subscript𝜋2\pi\coloneqq\pi_{1}+\pi_{2}italic_π ≔ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Guess i{1,2}𝑖12i\in\{1,2\}italic_i ∈ { 1 , 2 } and return the extended assignment and equality graph obtained by a recursive call to this procedure on πisubscript𝜋𝑖\pi_{i}italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, after removing the variables that do not appear in π3isubscript𝜋3𝑖\pi_{3-i}italic_π start_POSTSUBSCRIPT 3 - italic_i end_POSTSUBSCRIPT. (This is because those variables are of type 𝖬𝖺𝗒𝖻𝖾()𝖬𝖺𝗒𝖻𝖾\mathsf{Maybe}(\cdot)sansserif_Maybe ( ⋅ ) in π𝜋\piitalic_π.)

  • Case ππ1π2𝜋subscript𝜋1subscript𝜋2\pi\coloneqq\pi_{1}\pi_{2}italic_π ≔ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Perform a recursive call to this procedure on both π1subscript𝜋1\pi_{1}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and π2subscript𝜋2\pi_{2}italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT to obtain an extended assignment and an equality graph for πisubscript𝜋𝑖\pi_{i}italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i{1,2}𝑖12i\in\{1,2\}italic_i ∈ { 1 , 2 }. Check whether they unify (i.e., check whether the assignments of π1subscript𝜋1\pi_{1}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and π2subscript𝜋2\pi_{2}italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are strictly the same on their common variables). Then, merge the two equality graphs on 𝗍𝗀𝗍π1subscript𝗍𝗀𝗍subscript𝜋1\mathsf{tgt}_{\pi_{1}}sansserif_tgt start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 𝗌𝗋𝖼π2subscript𝗌𝗋𝖼subscript𝜋2\mathsf{src}_{\pi_{2}}sansserif_src start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT under the policy ρ𝜌\rhoitalic_ρ; and check consistency w.r.t. the unified extended assignment. Return the pair consisting of the unified extended assignment and the merged equality graph after removing 𝗍𝗀𝗍π1subscript𝗍𝗀𝗍subscript𝜋1\mathsf{tgt}_{\pi_{1}}sansserif_tgt start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 𝗌𝗋𝖼π2subscript𝗌𝗋𝖼subscript𝜋2\mathsf{src}_{\pi_{2}}sansserif_src start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and renaming 𝗌𝗋𝖼π1subscript𝗌𝗋𝖼subscript𝜋1\mathsf{src}_{\pi_{1}}sansserif_src start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT to 𝗌𝗋𝖼πsubscript𝗌𝗋𝖼𝜋\mathsf{src}_{\pi}sansserif_src start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT and 𝗍𝗀𝗍π2subscript𝗍𝗀𝗍subscript𝜋2\mathsf{tgt}_{\pi_{2}}sansserif_tgt start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT to 𝗍𝗀𝗍πsubscript𝗍𝗀𝗍𝜋\mathsf{tgt}_{\pi}sansserif_tgt start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT.

  • Case ππ1θ𝜋subscriptsubscript𝜋1delimited-⟨⟩𝜃\pi\coloneqq{\pi_{1}}_{\langle\theta\rangle}italic_π ≔ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ⟨ italic_θ ⟩ end_POSTSUBSCRIPT. Perform a recursive call to this procedure on π1subscript𝜋1\pi_{1}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to obtain an extended assignment and an equality graph for π1subscript𝜋1\pi_{1}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Check the validity of θ𝜃\thetaitalic_θ (as defined in the Semantics of conditioned patterns in (10.1145/3584372.3588662)) over the extended assignment of π1subscript𝜋1\pi_{1}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and return the same extended assignment and equality graph.

  • Case ππ1n..m\pi\coloneqq\pi_{1}^{n..m}italic_π ≔ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n . . italic_m end_POSTSUPERSCRIPT. Guess a length k𝑘kitalic_k between n𝑛nitalic_n and m𝑚mitalic_m. (Note that k𝑘kitalic_k can be assumed to be at most exponential in the size of the whole query if m𝑚mitalic_m is \infty by Lemma 7.2; hence, it can be written in binary using a polynomial amount of space.) Perform k𝑘kitalic_k successive recursive calls to this procedure on π1subscript𝜋1\pi_{1}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Each time drop from the obtained extended assignment and equality set all but the 𝗌𝗋𝖼𝗌𝗋𝖼\mathsf{src}sansserif_src and 𝗍𝗀𝗍𝗍𝗀𝗍\mathsf{tgt}sansserif_tgt variables and the equality or inequality edge between them. Check if they concatenate with the previous block obtained so far (i.e., perform the case ππ1π2𝜋subscript𝜋1subscript𝜋2\pi\coloneqq\pi_{1}\pi_{2}italic_π ≔ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT). Return a pair consisting of an extended assignment binding the 𝗌𝗋𝖼𝗌𝗋𝖼\mathsf{src}sansserif_src of the very first guess to 𝗌𝗋𝖼πsubscript𝗌𝗋𝖼𝜋\mathsf{src}_{\pi}sansserif_src start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT and the 𝗍𝗀𝗍𝗍𝗀𝗍\mathsf{tgt}sansserif_tgt of the very last guess to 𝗍𝗀𝗍πsubscript𝗍𝗀𝗍𝜋\mathsf{tgt}_{\pi}sansserif_tgt start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT; and of the equality graph containing only the 𝗌𝗋𝖼πsubscript𝗌𝗋𝖼𝜋\mathsf{src}_{\pi}sansserif_src start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT and 𝗍𝗀𝗍πsubscript𝗍𝗀𝗍𝜋\mathsf{tgt}_{\pi}sansserif_tgt start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT nodes, possibly with an === or an \neq edge between them if should be.

Example.

We illustrate how our procedure works for repetition patterns with the following example:

π𝗌𝗂𝗆𝗉𝗅𝖾([(u)  (v)]¬(u.a=v.a))0..\pi\coloneqq\mathsf{simple}\left(\left[(\mathit{u})\,\leavevmode\hbox to10.8pt% {\vbox to17.12pt{\pgfpicture\makeatletter\hbox{\hskip 5.39966pt\lower-8.31221% pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}% }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{5.47699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-5.19966pt}{0.0pt}\pgfsys@lineto{4.73967pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{4.73967pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}\,(\mathit{v})\right]_{\langle\neg\left(u.a=v% .a\right)\rangle}\right)^{0..\infty}italic_π ≔ sansserif_simple ( [ ( italic_u ) ( italic_v ) ] start_POSTSUBSCRIPT ⟨ ¬ ( italic_u . italic_a = italic_v . italic_a ) ⟩ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 0 . . ∞ end_POSTSUPERSCRIPT

We note π1[(u)  (v)]¬(u.a=v.a)\pi_{1}\coloneqq\left[(\mathit{u})\,\leavevmode\hbox to10.8pt{\vbox to17.12pt{% \pgfpicture\makeatletter\hbox{\hskip 5.39966pt\lower-8.31221pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{5.47699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-5.19966pt}{0.0pt}\pgfsys@lineto{4.73967pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{4.73967pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}\,(\mathit{v})\right]_{\langle\neg\left(u.a=v% .a\right)\rangle}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≔ [ ( italic_u ) ( italic_v ) ] start_POSTSUBSCRIPT ⟨ ¬ ( italic_u . italic_a = italic_v . italic_a ) ⟩ end_POSTSUBSCRIPT. There are three different types of behavior depending on k𝑘kitalic_k:

  • If k=0𝑘0k=0italic_k = 0, it returns a pair consisting of an extended assignment binding 𝗌𝗋𝖼πsubscript𝗌𝗋𝖼𝜋\mathsf{src}_{\pi}sansserif_src start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT and 𝗍𝗀𝗍πsubscript𝗍𝗀𝗍𝜋\mathsf{tgt}_{\pi}sansserif_tgt start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT to an arbitrarily guessed node element; and of an equality graph containing the edge 𝗌𝗋𝖼π=𝗍𝗀𝗍πsubscript𝗌𝗋𝖼𝜋subscript𝗍𝗀𝗍𝜋\mathsf{src}_{\pi}=\mathsf{tgt}_{\pi}sansserif_src start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT = sansserif_tgt start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT.

  • If k=1𝑘1k=1italic_k = 1, it returns a pair consisting of an extended assignment binding 𝗌𝗋𝖼πsubscript𝗌𝗋𝖼𝜋\mathsf{src}_{\pi}sansserif_src start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT and 𝗍𝗀𝗍πsubscript𝗍𝗀𝗍𝜋\mathsf{tgt}_{\pi}sansserif_tgt start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT to two node elements having a different value for a𝑎aitalic_a (if both set); and of the quality graph containing 𝗌𝗋𝖼π𝗍𝗀𝗍πsubscript𝗌𝗋𝖼𝜋subscript𝗍𝗀𝗍𝜋\mathsf{src}_{\pi}\neq\mathsf{tgt}_{\pi}sansserif_src start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ≠ sansserif_tgt start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT.

  • If k2𝑘2k\geq 2italic_k ≥ 2, it returns a pair consisting of an extended assignment binding 𝗌𝗋𝖼πsubscript𝗌𝗋𝖼𝜋\mathsf{src}_{\pi}sansserif_src start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT and 𝗍𝗀𝗍πsubscript𝗍𝗀𝗍𝜋\mathsf{tgt}_{\pi}sansserif_tgt start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT to two arbitrary node elements; and of the quality graph containing 𝗌𝗋𝖼π𝗍𝗀𝗍πsubscript𝗌𝗋𝖼𝜋subscript𝗍𝗀𝗍𝜋\mathsf{src}_{\pi}\neq\mathsf{tgt}_{\pi}sansserif_src start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ≠ sansserif_tgt start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT.

Invariant.

Let π𝜋\piitalic_π be a pattern matched under a restrictor ρ{𝗌𝗂𝗆𝗉𝗅𝖾,𝗍𝗋𝖺𝗂𝗅}𝜌𝗌𝗂𝗆𝗉𝗅𝖾𝗍𝗋𝖺𝗂𝗅\rho\in\{\mathsf{simple},\mathsf{trail}\}italic_ρ ∈ { sansserif_simple , sansserif_trail }. The pair (η,Gπ)\eta,G_{\pi})italic_η , italic_G start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ) is an output of the procedure consisting of an extended assignment and an equality graph for π𝜋\piitalic_π iff there exists a property graph P𝑃Pitalic_P such that (p,μ)πP(p,\mu)\in\llbracket\pi\rrbracket_{P}( italic_p , italic_μ ) ∈ ⟦ italic_π ⟧ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT222The set of answers to π𝜋\piitalic_π on P𝑃Pitalic_P (10.1145/3584372.3588662). and if, for all x𝑥xitalic_x s.t. 𝗌𝖼𝗁(π)(x){𝖭𝗈𝖽𝖾,𝖤𝖽𝗀𝖾}𝗌𝖼𝗁𝜋𝑥𝖭𝗈𝖽𝖾𝖤𝖽𝗀𝖾\mathsf{sch}(\pi)(x)\in\{\mathsf{Node},\mathsf{Edge}\}sansserif_sch ( italic_π ) ( italic_x ) ∈ { sansserif_Node , sansserif_Edge }, we have:

  • λ(μ(x))=λ(η(x))𝜆𝜇𝑥𝜆𝜂𝑥\lambda(\mu(x))=\lambda(\eta(x))italic_λ ( italic_μ ( italic_x ) ) = italic_λ ( italic_η ( italic_x ) );

  • for all key (property) a𝑎aitalic_a, δ(μ(x),a)=δ(η(x),a)𝛿𝜇𝑥𝑎𝛿𝜂𝑥𝑎\delta(\mu(x),a)=\delta(\eta(x),a)italic_δ ( italic_μ ( italic_x ) , italic_a ) = italic_δ ( italic_η ( italic_x ) , italic_a ), or both are not defined;

  • if x=y𝑥𝑦x=yitalic_x = italic_y in G𝐺Gitalic_G then μ(x)=μ(y)𝜇𝑥𝜇𝑦\mu(x)=\mu(y)italic_μ ( italic_x ) = italic_μ ( italic_y ); similarly, if xy𝑥𝑦x\neq yitalic_x ≠ italic_y in G𝐺Gitalic_G then μ(x)μ(y)𝜇𝑥𝜇𝑦\mu(x)\neq\mu(y)italic_μ ( italic_x ) ≠ italic_μ ( italic_y ).

In the following, we provide the key ideas for proving this invariant by induction:

  • Case π(x:)\pi\coloneqq(\mathit{x}:\mathsf{\ell})italic_π ≔ ( italic_x : roman_ℓ ) and case πy:  𝜋:𝑦  \pi\coloneqq\leavevmode\hbox to28.24pt{\vbox to20.23pt{\pgfpicture% \makeatletter\hbox{\hskip 14.1218pt\lower-8.31221pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-10.3888pt}{3.72699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\mathit{y}% \,:\,\mathsf{\ell}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-13.9218pt}{0.0pt}\pgfsys@lineto{13.4618pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{13.4618pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}italic_π ≔ italic_y : roman_ℓ are trivial as P𝑃Pitalic_P can be obtained directly from η𝜂\etaitalic_η.

  • Case ππ1+π2𝜋subscript𝜋1subscript𝜋2\pi\coloneqq\pi_{1}+\pi_{2}italic_π ≔ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT by applying the hypothesis on either side. Note that π𝜋\piitalic_π may contain fewer singleton variables than πisubscript𝜋𝑖\pi_{i}italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. (This is because the variables of πisubscript𝜋𝑖\pi_{i}italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT that do not appear in π3isubscript𝜋3𝑖\pi_{3-i}italic_π start_POSTSUBSCRIPT 3 - italic_i end_POSTSUBSCRIPT are of type 𝖬𝖺𝗒𝖻𝖾()𝖬𝖺𝗒𝖻𝖾\mathsf{Maybe}(\cdot)sansserif_Maybe ( ⋅ ) in π𝜋\piitalic_π.)

  • Case ππ1π2𝜋subscript𝜋1subscript𝜋2\pi\coloneqq\pi_{1}\pi_{2}italic_π ≔ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT relies on the merge procedure to propagate the forced equalities and inequalities.

  • Case ππ1θ𝜋subscriptsubscript𝜋1delimited-⟨⟩𝜃\pi\coloneqq{\pi_{1}}_{\langle\theta\rangle}italic_π ≔ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUBSCRIPT ⟨ italic_θ ⟩ end_POSTSUBSCRIPT by noticing that we only need to check whether the condition holds over the extended assignment because conditions only apply on singleton variables. (This is enforced by the Typing rules for the GPC type system presented in Figure 2 of (10.1145/3584372.3588662).) Notice that ¬(u.a=v.a)\neg(u.a=v.a)¬ ( italic_u . italic_a = italic_v . italic_a ) does not lead to uv𝑢𝑣u\neq vitalic_u ≠ italic_v because either u.aformulae-sequence𝑢𝑎u.aitalic_u . italic_a or v.aformulae-sequence𝑣𝑎v.aitalic_v . italic_a may be undefined; thus, we don’t need to update G𝐺Gitalic_G. (Again, this is because of the Semantics of conditioned patterns of (10.1145/3584372.3588662) where μ(x.a=x.b)\mu\models(x.a=x.b)italic_μ ⊧ ( italic_x . italic_a = italic_x . italic_b ) iff δ(μ(x),a)𝛿𝜇𝑥𝑎\delta(\mu(x),a)italic_δ ( italic_μ ( italic_x ) , italic_a ) and δ(η(y),b)𝛿𝜂𝑦𝑏\delta(\eta(y),b)italic_δ ( italic_η ( italic_y ) , italic_b ) are defined and equal.)

  • Case ππ1n..m\pi\coloneqq\pi_{1}^{n..m}italic_π ≔ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n . . italic_m end_POSTSUPERSCRIPT because π𝜋\piitalic_π does not contain any singleton variables. Hence, only a potential equality or inequality between its endpoints is tracked throughout the k𝑘kitalic_k iteration steps over π1subscript𝜋1\pi_{1}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

Extension to queries (with join and conditioning).

Let QQ1,Q2𝑄subscript𝑄1subscript𝑄2Q\coloneqq Q_{1},Q_{2}italic_Q ≔ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT be a join query with Q1subscript𝑄1Q_{1}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Q2subscript𝑄2Q_{2}italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT matched under restrictors 𝗌𝗂𝗆𝗉𝗅𝖾𝗌𝗂𝗆𝗉𝗅𝖾\mathsf{simple}sansserif_simple or 𝗍𝗋𝖺𝗂𝗅𝗍𝗋𝖺𝗂𝗅\mathsf{trail}sansserif_trail; and let the pairs (ηi,GQi),i{1,2}subscript𝜂𝑖subscript𝐺subscript𝑄𝑖𝑖12(\eta_{i},G_{Q_{i}}),i\in\{1,2\}( italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , italic_i ∈ { 1 , 2 } be returned by the previous procedure on each Qisubscript𝑄𝑖Q_{i}italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The procedure (for Q𝑄Qitalic_Q) checks if the ηisubscript𝜂𝑖\eta_{i}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s unify (i.e., check whether the ηisubscript𝜂𝑖\eta_{i}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s agree on their common variables), and if the result of merging the Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s remains consistent.

Note that this procedure supports conditioning over join queries (i.e., we add the following query expression QQθ𝑄subscript𝑄delimited-⟨⟩𝜃Q\coloneqq Q_{\langle\theta\rangle}italic_Q ≔ italic_Q start_POSTSUBSCRIPT ⟨ italic_θ ⟩ end_POSTSUBSCRIPT) by checking if the condition is valid over η𝜂\etaitalic_η, similar to what is done for conditioning in patterns.

Extension to GPC+.

Simply guess a GPC query among all the disjuncts, and check its satisfiability using the previous procedure.

Shortest restrictor.

Note that the invariant is not valid if the 𝗌𝗁𝗈𝗋𝗍𝖾𝗌𝗍𝗌𝗁𝗈𝗋𝗍𝖾𝗌𝗍\mathsf{shortest}sansserif_shortest restrictor is used. For instance, consider the following pattern, which is not supported by the above procedure:

𝗌𝗁𝗈𝗋𝗍𝖾𝗌𝗍𝗌𝗂𝗆𝗉𝗅𝖾(x)[()  (u)a=1  ()+()  ()  (u)  ()  ()](y),𝗌𝗂𝗆𝗉𝗅𝖾(x)  (w)a=1  (y),𝗌𝗂𝗆𝗉𝗅𝖾(u)a=2𝗌𝗁𝗈𝗋𝗍𝖾𝗌𝗍𝗌𝗂𝗆𝗉𝗅𝖾𝑥delimited-[]  delimited-⟨⟩𝑎1𝑢  +    𝑢    𝑦𝗌𝗂𝗆𝗉𝗅𝖾𝑥  delimited-⟨⟩𝑎1𝑤  𝑦𝗌𝗂𝗆𝗉𝗅𝖾delimited-⟨⟩𝑎2𝑢\begin{split}&\mathsf{shortest}\,\mathsf{simple}\,(\mathit{x})\left[(\,)\,% \leavevmode\hbox to10.8pt{\vbox to17.12pt{\pgfpicture\makeatletter\hbox{\hskip 5% .39966pt\lower-8.31221pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }% \definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}% \pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{\pgfsys@beginscope% \pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{5.47699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-5.19966pt}{0.0pt}\pgfsys@lineto{4.73967pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{4.73967pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}\underset{\langle a=1\rangle}{(\mathit{u})}% \leavevmode\hbox to10.8pt{\vbox to17.12pt{\pgfpicture\makeatletter\hbox{\hskip 5% .39966pt\lower-8.31221pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }% \definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}% \pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{\pgfsys@beginscope% \pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{5.47699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-5.19966pt}{0.0pt}\pgfsys@lineto{4.73967pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{4.73967pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}\,(\,)\,\mbox{\LARGE+}\,(\,)\,\leavevmode% \hbox to10.8pt{\vbox to17.12pt{\pgfpicture\makeatletter\hbox{\hskip 5.39966pt% \lower-8.31221pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}% }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{5.47699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-5.19966pt}{0.0pt}\pgfsys@lineto{4.73967pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{4.73967pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}\,(\,)\,\leavevmode\hbox to10.8pt{\vbox to% 17.12pt{\pgfpicture\makeatletter\hbox{\hskip 5.39966pt\lower-8.31221pt\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill% {0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }% \nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{% {\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{5.47699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-5.19966pt}{0.0pt}\pgfsys@lineto{4.73967pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{4.73967pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}\,(\mathit{u})\,\leavevmode\hbox to10.8pt{% \vbox to17.12pt{\pgfpicture\makeatletter\hbox{\hskip 5.39966pt\lower-8.31221pt% \hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}% }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{5.47699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-5.19966pt}{0.0pt}\pgfsys@lineto{4.73967pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{4.73967pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}\,(\,)\,\leavevmode\hbox to10.8pt{\vbox to% 17.12pt{\pgfpicture\makeatletter\hbox{\hskip 5.39966pt\lower-8.31221pt\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill% {0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }% \nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{% {\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{5.47699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-5.19966pt}{0.0pt}\pgfsys@lineto{4.73967pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{4.73967pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}\,(\,)\right](\mathit{y}),\\ &\mathsf{simple}\,(x)\,\leavevmode\hbox to10.8pt{\vbox to17.12pt{\pgfpicture% \makeatletter\hbox{\hskip 5.39966pt\lower-8.31221pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{5.47699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-5.19966pt}{0.0pt}\pgfsys@lineto{4.73967pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{4.73967pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}\underset{\langle a=1\rangle}{(\mathit{w})}% \leavevmode\hbox to10.8pt{\vbox to17.12pt{\pgfpicture\makeatletter\hbox{\hskip 5% .39966pt\lower-8.31221pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }% \definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}% \pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{\pgfsys@beginscope% \pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{5.47699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-5.19966pt}{0.0pt}\pgfsys@lineto{4.73967pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{4.73967pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}\,(y),\\ &\mathsf{simple}\,\underset{\langle a=2\rangle}{(\mathit{u})}\end{split}start_ROW start_CELL end_CELL start_CELL sansserif_shortest sansserif_simple ( italic_x ) [ ( ) start_UNDERACCENT ⟨ italic_a = 1 ⟩ end_UNDERACCENT start_ARG ( italic_u ) end_ARG ( ) + ( ) ( ) ( italic_u ) ( ) ( ) ] ( italic_y ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL sansserif_simple ( italic_x ) start_UNDERACCENT ⟨ italic_a = 1 ⟩ end_UNDERACCENT start_ARG ( italic_w ) end_ARG ( italic_y ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL sansserif_simple start_UNDERACCENT ⟨ italic_a = 2 ⟩ end_UNDERACCENT start_ARG ( italic_u ) end_ARG end_CELL end_ROW

Nevertheless, we can easily prove that the above procedure works when all patterns in a GPC query Q𝑄Qitalic_Q that are matched under the 𝗌𝗁𝗈𝗋𝗍𝖾𝗌𝗍𝗌𝗁𝗈𝗋𝗍𝖾𝗌𝗍\mathsf{shortest}sansserif_shortest restrictor have only their endpoints for singleton variables. ∎

{toappendix}
Lemma 7.2.

Let ππ1𝜋superscriptsubscript𝜋1\pi\coloneqq\pi_{1}^{\infty}italic_π ≔ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT be a sub-pattern of a GPC query ρπ0𝜌subscript𝜋0\rho\pi_{0}italic_ρ italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. If there is an answer path for π𝜋\piitalic_π, then there is another answer path for π𝜋\piitalic_π consisting of at most k𝑘kitalic_k repetitions of π1subscript𝜋1\pi_{1}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, with k𝑘kitalic_k exponential in the size of π0subscript𝜋0\pi_{0}italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and with all answer paths for π1subscript𝜋1\pi_{1}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT being inner disjoints.

Proof.

There are at most exponentially many distinct records for nodes with values in 𝖢𝗈𝗇𝗌𝗍0subscript𝖢𝗈𝗇𝗌𝗍0\mathsf{Const}_{0}sansserif_Const start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, keys in 𝒦0subscript𝒦0\mathcal{K}_{0}caligraphic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and labels in 0subscript0\mathcal{L}_{0}caligraphic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Given the Semantics of repeated patterns in (10.1145/3584372.3588662), only the target node of an iteration has an impact over the next iteration by being its source; only this information is transferred across successive repetitions of π1subscript𝜋1\pi_{1}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Thus, we can reduce the number of repetitions of π1subscript𝜋1\pi_{1}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in the initial answer path because one target node necessarily gets repeated. We can moreover assume w.l.o.g. that all answer paths to π𝜋\piitalic_π are inner disjoints, by taking disjoint copies of the initial answer paths. ∎

We prove the upper-bound of Theorem 7.3 by inductively constructing an equality type over all variables in the query. This non-deterministic procedure uses only a polynomial amount of space by avoiding storing the full match of the pattern. Unfortunately, this does not extend to queries using the shortest restrictor: they seem to require storing the full match. We leave open the question of pinpointing the exact complexity of satisfiability for such queries.

Given the high complexity lower bounds, one might wonder whether there are useful subclasses of GPC with tractable satisfiability. In Lemma 7.3 below, we show that even under strong limitations, satisfiability is still intractable.

{lemmarep}

The satisfiability problem is NP-hard even for single-node GPC patterns.

Proof.

We reduce 3-SAT to the satisfiability of a GPC query. Let F𝐹Fitalic_F be the following 3-SAT formula over n𝑛nitalic_n variables and m𝑚mitalic_m clauses:

1im(li1li2li3)subscript1𝑖𝑚subscript𝑙𝑖1subscript𝑙𝑖2subscript𝑙𝑖3\bigwedge\limits_{1\leq i\leq m}\left(l_{i1}\vee l_{i2}\vee l_{i3}\right)⋀ start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_m end_POSTSUBSCRIPT ( italic_l start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT ∨ italic_l start_POSTSUBSCRIPT italic_i 2 end_POSTSUBSCRIPT ∨ italic_l start_POSTSUBSCRIPT italic_i 3 end_POSTSUBSCRIPT )

where for all 1im1𝑖𝑚1\leq i\leq m1 ≤ italic_i ≤ italic_m and j{1,2,3}𝑗123j\in\{1,2,3\}italic_j ∈ { 1 , 2 , 3 }, lijsubscript𝑙𝑖𝑗l_{ij}italic_l start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is a literal which is either xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT or x¯ksubscript¯𝑥𝑘\bar{x}_{k}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for a k{1,,n}𝑘1𝑛k\in\{1,\dots,n\}italic_k ∈ { 1 , … , italic_n }.

We construct the following GPC query Q(x)ρπθ𝑄𝑥𝜌subscript𝜋delimited-⟨⟩𝜃Q(x)\coloneqq\rho\>\pi_{\langle\theta\rangle}italic_Q ( italic_x ) ≔ italic_ρ italic_π start_POSTSUBSCRIPT ⟨ italic_θ ⟩ end_POSTSUBSCRIPT with π(x:)\pi\coloneqq(x:\ell)italic_π ≔ ( italic_x : roman_ℓ ) and

θ𝜃absent\displaystyle\theta\coloneqqitalic_θ ≔ 1in(xi=1x¯i=0)(xi=0x¯i=1)subscript1𝑖𝑛subscript𝑥𝑖1subscript¯𝑥𝑖0subscript𝑥𝑖0subscript¯𝑥𝑖1\displaystyle\bigwedge\limits_{1\leq i\leq n}\left(x_{i}=1\wedge\bar{x}_{i}=0% \right)\vee(x_{i}=0\wedge\bar{x}_{i}=1)⋀ start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_n end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ∧ over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 ) ∨ ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 ∧ over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 )
\displaystyle\wedge 1jm(b1,b2,b3)C3(lj1=b1lj2=b2lj3=b3)subscript1𝑗𝑚subscriptsubscript𝑏1subscript𝑏2subscript𝑏3superscript𝐶3subscript𝑙𝑗1subscript𝑏1subscript𝑙𝑗2subscript𝑏2subscript𝑙𝑗3subscript𝑏3\displaystyle\bigwedge\limits_{1\leq j\leq m}\bigvee\limits_{(b_{1},b_{2},b_{3% })\in C^{3}}\left(l_{j1}=b_{1}\wedge l_{j2}=b_{2}\wedge l_{j3}=b_{3}\right)⋀ start_POSTSUBSCRIPT 1 ≤ italic_j ≤ italic_m end_POSTSUBSCRIPT ⋁ start_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∈ italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_l start_POSTSUBSCRIPT italic_j 1 end_POSTSUBSCRIPT = italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∧ italic_l start_POSTSUBSCRIPT italic_j 2 end_POSTSUBSCRIPT = italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∧ italic_l start_POSTSUBSCRIPT italic_j 3 end_POSTSUBSCRIPT = italic_b start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT )

where C3={(0,0,1),,(1,1,1)}superscript𝐶3001111C^{3}=\{(0,0,1),\dots,(1,1,1)\}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT = { ( 0 , 0 , 1 ) , … , ( 1 , 1 , 1 ) }; x𝑥xitalic_x and \ellroman_ℓ are optional in π𝜋\piitalic_π; the literals of F𝐹Fitalic_F are used as property names in Q𝑄Qitalic_Q; and size of Q𝑄Qitalic_Q is clearly polynomial in the size of F𝐹Fitalic_F.

We now show that F𝐹Fitalic_F is satisfiable if and only if there exists a property graph G𝐺Gitalic_G on which Q(x)𝑄𝑥Q(x)italic_Q ( italic_x ) returns at least a node.

(\Rightarrow) Assume that F𝐹Fitalic_F is satisfied by an assignment ν𝜈\nuitalic_ν to x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG. We construct a property graph containing a unique \ellroman_ℓ-labeled node with identifier o𝑜oitalic_o, having the following record:

i{1,,n}{xi1,x¯i0if νi=xi0,x¯i1if νi=}.for-all𝑖1𝑛maps-tosubscript𝑥𝑖1maps-tosubscript¯𝑥𝑖0if subscript𝜈𝑖topmaps-tosubscript𝑥𝑖0maps-tosubscript¯𝑥𝑖1if subscript𝜈𝑖bottom\forall i\in\{1,\dots,n\}\left\{\begin{array}[]{lr}\begin{array}[]{l@{}}x_{i}% \mapsto 1,\\ \bar{x}_{i}\mapsto 0\end{array}&\text{if }\nu_{i}=\top\\ \begin{array}[]{l@{}}x_{i}\mapsto 0,\\ \bar{x}_{i}\mapsto 1\end{array}&\text{if }\nu_{i}=\bot\end{array}\right\}.∀ italic_i ∈ { 1 , … , italic_n } { start_ARRAY start_ROW start_CELL start_ARRAY start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ↦ 1 , end_CELL end_ROW start_ROW start_CELL over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ↦ 0 end_CELL end_ROW end_ARRAY end_CELL start_CELL if italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ⊤ end_CELL end_ROW start_ROW start_CELL start_ARRAY start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ↦ 0 , end_CELL end_ROW start_ROW start_CELL over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ↦ 1 end_CELL end_ROW end_ARRAY end_CELL start_CELL if italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ⊥ end_CELL end_ROW end_ARRAY } .

By design, the top-most conjunct of θ𝜃\thetaitalic_θ is satisfied. Let Cisubscript𝐶𝑖C_{i}italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for any 1im1𝑖𝑚1\leq i\leq m1 ≤ italic_i ≤ italic_m be a clause in F𝐹Fitalic_F; by hypothesis cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is satisfied by ν𝜈\nuitalic_ν, so there exists (b1,b2,b3)C3subscript𝑏1subscript𝑏2subscript𝑏3superscript𝐶3(b_{1},b_{2},b_{3})\in C^{3}( italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∈ italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT such that δ(o,lij)=bj𝛿𝑜subscript𝑙𝑖𝑗subscript𝑏𝑗\delta(o,l_{ij})=b_{j}italic_δ ( italic_o , italic_l start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) = italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for all 1j31𝑗31\leq j\leq 31 ≤ italic_j ≤ 3.

(\Leftarrow) Conversely, if Q(x)𝑄𝑥Q(x)italic_Q ( italic_x ) is satisfied in a property graph G𝐺Gitalic_G for an element o𝑜oitalic_o; we have that oN𝑜𝑁o\in Nitalic_o ∈ italic_N and each xi,x¯isubscript𝑥𝑖subscript¯𝑥𝑖x_{i},\bar{x}_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are defined in the record of o𝑜oitalic_o. The restrictions enforced by the top-most conjunct of θ𝜃\thetaitalic_θ ensure that we can retrieve a well-defined assignment for F𝐹Fitalic_F; the last conjunct ensures that this is a valid assignment for F𝐹Fitalic_F. ∎

7.4. Back to consistency

From Theorem 7.3, Lemma 7.1, and Lemma 7.1, we obtain the following fundamental result.

Corollary 7.3.

The consistency problem is PSpace-complete for transformations using only 𝗌𝗂𝗆𝗉𝗅𝖾𝗌𝗂𝗆𝗉𝗅𝖾\mathsf{simple}sansserif_simple and 𝗍𝗋𝖺𝗂𝗅𝗍𝗋𝖺𝗂𝗅\mathsf{trail}sansserif_trail restrictors.

In fact, the PSpace lower bound holds already for transformations using only two rules and any single restrictor. From Lemma 7.3 and Lemma 7.1 it follows that the problem remains intractable even for transformations using very restricted GPC queries.

In the light of these high complexity lower bounds, it is unlikely that conflict detection can be handled statically in practice. This means that conflicts have to be handled dynamically, when the transformation is executed. In Section 8 we discuss how this can be implemented in practice and in Section 9 we show experimentally that the incurred overhead is affordable.

8. Translation to Cypher

Algorithm 1 can be seen as an abstraction of a transformation engine: it takes a transformation and an input property graph, and produces an output property graph. In this section we show how to compile a transformation to an openCypher script that can be directly executed in any openCypher engine. This is similar in spirit to executable SQL scripts for relational schema mappings, scalable and efficient in producing target solutions (bernstein_model_2007).

We first discuss the overall complexity of Algorithm 1. Lines 8 and 14 involve a set-theoretic union and, without appropriate optimization, their cost is proportional to the current number of elements in T(G)𝑇𝐺T(G)italic_T ( italic_G ) in each iteration of the loop. Lines 910 and  1517 can be implemented in 𝒪(1)𝒪1\mathcal{O}(1)caligraphic_O ( 1 ) provided that Lines 8 and 14 return a pointer to the element of(C.Id(o¯))T(G)o\coloneqq f(C.\mbox{Id}(\bar{o}))\in T(G)italic_o ≔ italic_f ( italic_C . Id ( over¯ start_ARG italic_o end_ARG ) ) ∈ italic_T ( italic_G ). Thus the overall complexity of Algorithm 1 on input G𝐺Gitalic_G is:

(3) 𝒪(tint+ncInt(G,T)|T(G)|)𝒪subscript𝑡𝑖𝑛𝑡subscript𝑛𝑐𝐼𝑛𝑡𝐺𝑇𝑇𝐺\mathcal{O}\left(t_{int}+n_{c}\cdot Int(G,T)\cdot|T(G)|\right)caligraphic_O ( italic_t start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ⋅ italic_I italic_n italic_t ( italic_G , italic_T ) ⋅ | italic_T ( italic_G ) | )

where ncsubscript𝑛𝑐n_{c}italic_n start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is the total number of content constructors in T𝑇Titalic_T, Int(G,T)𝐼𝑛𝑡𝐺𝑇Int(G,T)italic_I italic_n italic_t ( italic_G , italic_T ) and tintsubscript𝑡𝑖𝑛𝑡t_{int}italic_t start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT are respectively the total size of all intermediate results PGx¯\llbracket P\rrbracket_{G}^{\bar{x}}⟦ italic_P ⟧ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUPERSCRIPT and the overall running time for computing PGx¯\llbracket P\rrbracket_{G}^{\bar{x}}⟦ italic_P ⟧ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over¯ start_ARG italic_x end_ARG end_POSTSUPERSCRIPT, with P(x¯)𝑃¯𝑥P(\bar{x})italic_P ( over¯ start_ARG italic_x end_ARG ) ranging over all left-hand sides of rules in T𝑇Titalic_T.

Thus, the total time taken by Algorithm 1 implemented naively is quadratic in the size of the property graphs, which makes it practically unusable for large input instances. However, the complexity heavily depends on the implementation of the set-theoretic unions.

(u:𝖴𝗌𝖾𝗋),(a:𝖠𝖽𝖽𝗋𝖾𝗌𝗌),(:𝖫𝗈𝖼𝖺𝗍𝗂𝗈𝗇)u.address=a.aid,u.address=.aid(x=(u):𝖯𝖾𝗋𝗌𝗈𝗇)name=u.name\displaystyle\underset{\langle u.address=a.aid,\>u.address=\ell.aid\rangle}{(u% :\mathsf{User}),(a:\mathsf{Address}),(\ell:\mathsf{Location})}\implies% \underset{\langle name=u.name\rangle}{\left(x=(u):\mathsf{Person}\right)}start_UNDERACCENT ⟨ italic_u . italic_a italic_d italic_d italic_r italic_e italic_s italic_s = italic_a . italic_a italic_i italic_d , italic_u . italic_a italic_d italic_d italic_r italic_e italic_s italic_s = roman_ℓ . italic_a italic_i italic_d ⟩ end_UNDERACCENT start_ARG ( italic_u : sansserif_User ) , ( italic_a : sansserif_Address ) , ( roman_ℓ : sansserif_Location ) end_ARG ⟹ start_UNDERACCENT ⟨ italic_n italic_a italic_m italic_e = italic_u . italic_n italic_a italic_m italic_e ⟩ end_UNDERACCENT start_ARG ( italic_x = ( italic_u ) : sansserif_Person ) end_ARG (𝖧𝖺𝗌𝖫𝗈𝖼𝖺𝗍𝗂𝗈𝗇):𝖧𝖺𝗌𝖫𝗈𝖼𝖺𝗍𝗂𝗈𝗇  ((.countryName):𝖢𝗈𝗎𝗇𝗍𝗋𝗒)name=.countryName,code=.countryCode,\displaystyle\>\underset{}{\leavevmode\hbox to93.29pt{\vbox to19.61pt{% \pgfpicture\makeatletter\hbox{\hskip 46.64725pt\lower-7.97891pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-43.24754pt}{3.902pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\mathit{(% \mathsf{HasLocation})}\,:\,\mathsf{HasLocation}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.5pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb% }{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-46.44725pt}{0.0pt}\pgfsys@lineto{45.98726pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{45.% 98726pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\underset{\langle name=\ell.countryName,\>% code=\ell.countryCode\rangle}{\left((\ell.countryName):\mathsf{Country}\right)},start_UNDERACCENT end_UNDERACCENT start_ARG ( sansserif_HasLocation ) : sansserif_HasLocation end_ARG start_UNDERACCENT ⟨ italic_n italic_a italic_m italic_e = roman_ℓ . italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_N italic_a italic_m italic_e , italic_c italic_o italic_d italic_e = roman_ℓ . italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_C italic_o italic_d italic_e ⟩ end_UNDERACCENT start_ARG ( ( roman_ℓ . italic_c italic_o italic_u italic_n italic_t italic_r italic_y italic_N italic_a italic_m italic_e ) : sansserif_Country ) end_ARG ,
(x)𝑥\displaystyle(x)( italic_x ) (𝖧𝖺𝗌𝖠𝖽𝖽𝗋𝖾𝗌𝗌):𝖧𝖺𝗌𝖠𝖽𝖽𝗋𝖾𝗌𝗌  ((a.cityName):𝖢𝗂𝗍𝗒)name=a.cityName,code=a.cityCode\displaystyle\>\underset{}{\leavevmode\hbox to89.09pt{\vbox to19.61pt{% \pgfpicture\makeatletter\hbox{\hskip 44.54723pt\lower-7.97891pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-41.14752pt}{3.902pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\mathit{(% \mathsf{HasAddress})}\,:\,\mathsf{HasAddress}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.5pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb% }{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-44.34723pt}{0.0pt}\pgfsys@lineto{43.88724pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{43.% 88724pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\underset{\langle name=a.cityName,\>code=a% .cityCode\rangle}{\left((a.cityName):\mathsf{City}\right)}start_UNDERACCENT end_UNDERACCENT start_ARG ( sansserif_HasAddress ) : sansserif_HasAddress end_ARG start_UNDERACCENT ⟨ italic_n italic_a italic_m italic_e = italic_a . italic_c italic_i italic_t italic_y italic_N italic_a italic_m italic_e , italic_c italic_o italic_d italic_e = italic_a . italic_c italic_i italic_t italic_y italic_C italic_o italic_d italic_e ⟩ end_UNDERACCENT start_ARG ( ( italic_a . italic_c italic_i italic_t italic_y italic_N italic_a italic_m italic_e ) : sansserif_City ) end_ARG
Figure 4. Refined property graph transformation Trsubscript𝑇rT_{\mathrm{r}}italic_T start_POSTSUBSCRIPT roman_r end_POSTSUBSCRIPT.

Plain implementation. In Figure 5 we showcase the result of our translation strategy for the variant Trsubscript𝑇rT_{\mathrm{r}}italic_T start_POSTSUBSCRIPT roman_r end_POSTSUBSCRIPT of T𝑇Titalic_T, presented in Figure 4. This transformation has only one rule and is translated into a single executable script. For transformations with several rules, each rule of the transformation is independently translated into a script.

Cypher’s built-in \mintinlinecypher—elementId— primitive provides access to the identifier of an element, which is unique among all elements in the database. It plays a crucial role in our implementation as we actively use these identifiers as arguments to the Skolem function generating output identifiers. To the best of our knowledge, there is no explicit control of the creation of new identifiers in Neo4j, so we equip nodes and edges in the output graph with a special property _\mintinlinecypher—id— that plays the role of controllable element identifier.

Lines 55 correspond to the left part of the rule and are responsible for retrieving the necessary information from the input property graph. Recall that, in Line 5 of Algorithm 1, a node rule is added for each endpoint of every edge constructor in the transformation. Accordingly, in the openCypher script, each node constructor used on the right-hand side of the rule is considered separately (Lines 55). Similarly to how Skolem functions are usually implemented in relational data exchange for schema mapping tasks (bernstein_model_2007), we implement them with string operations, e.g., _\mintinlinecypher—id: ”(” —+\mintinlinecypher— elementId(u) —+\mintinlinecypher— ”)”.— We rely on the semantics of Cypher’s \mintinlinecypher—MERGE— clause, described in (green_updating_2019), to implement the set-theoretic union: in Lines 55, and 5, \mintinlinecypher—MERGE— checks whether an element with this identifier already exists in the graph; either one exists and is retrieved, or a new element is created. Adding the corresponding label(s) to the retrieved node (Line 9 of Algorithm 1) is implemented with the native Cypher’s \mintinlinecypher—SET— clause in Lines 55, and 5. Similarly, the properties of the nodes (Line 10 of Algorithm 1) are set in Lines 55, and 5.

Finally, the relationships are created (Lines 55). To keep the value of _\mintinlinecypher—id— unique among all elements in the output, and given the restriction that relationships hold a single label in Neo4j, the edge labels have been provided as arguments to the Skolem functions in Figure 4. Note that, when we merge an edge pattern, we are sure that the endpoints already exist in the database.

{minted}

[xleftmargin=2em, linenos=true, fontsize=, escapeinside=!!]cypher MATCH (u:User) !! MATCH (a:Address) WHERE a.aid = u.address MATCH (l:Location) WHERE l.aid = u.address !! MERGE (x:_dummy _id: ”(” + elementId(u) + ”)” ) !! SET x:Person, !! x.name = u.name !! MERGE (y:_dummy _id: ”(” + l.countryName + ”)” ) !! SET y:Country, !! y.name = l.countryName, y.code = l.countryCode !! MERGE (z:_dummy _id: ”(” + a.cityName + ”)” ) !! SET z:City, !! z.name = a.cityName, z.code = a.cityCode !! MERGE (x)-[hl:HasLocation !! _id: ”(” + elementId(x) + ”,” + ”HasLocation” + ”,” + elementId(y) + ”)” ]-¿(y) MERGE (x)-[ha:HasAddress !! _id: ”(” + elementId(x) + ”,” + ”HasAddress” + ”,” + elementId(z) + ”)” ]-¿(z) !!

Figure 5. openCypher script corresponding to Trsubscript𝑇rT_{\mathrm{r}}italic_T start_POSTSUBSCRIPT roman_r end_POSTSUBSCRIPT (Figure 4).

We point out that the _\mintinlinecypher—id— property and the _\mintinlinecypher—dummy— label are internal data; they are of no interest to the end user and can be dropped after the transformation with Cypher’s \mintinlinecypher—REMOVE— command.

Optimizations. Optimizing the \mintinlinecypher—MERGE— clauses in Lines 5555, and 5 which implement the set-theoretic unions is crucial in reducing the overall execution time of the transformation.

As is the case in most database management systems, Neo4j provides facilities for query optimization. The two that are relevant in this context are indexes and uniqueness constraints. An index permits to retrieve efficiently nodes with a given label that have a specific value at a given property. When we know in advance that all these values are unique, we can make further use of uniqueness constraints (UCs). Note that in our implementation, we maintain the invariant that each _\mintinlinecypher—id— is unique across all elements in the output.

In the version of Neo4j Community Edition that we use for running the experiments, indexes are implemented using b-trees, which means that the cost of testing if an index with n𝑛nitalic_n elements contains a given key is 𝒪(logn)𝒪𝑛\mathcal{O}(\log n)caligraphic_O ( roman_log italic_n ). That is, by using indexes we can improve the worst-case complexity of Algorithm 1 to:

(4) 𝒪(tint+ncInt(G,T)log|T(G)|)𝒪subscript𝑡𝑖𝑛𝑡subscript𝑛𝑐𝐼𝑛𝑡𝐺𝑇𝑇𝐺\mathcal{O}\left(t_{int}+n_{c}\cdot Int(G,T)\cdot\log|T(G)|\right)caligraphic_O ( italic_t start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ⋅ italic_I italic_n italic_t ( italic_G , italic_T ) ⋅ roman_log | italic_T ( italic_G ) | )

In the next section we comprehensively evaluate the advantages and disadvantages of using indexes and uniqueness constraints on nodes and relationships, defined on the label/property pair _\mintinlinecypher—dummy—/_\mintinlinecypher—id—.

Conflict detection. The consistency problem is unfortunately PSpace-complete by Corollary 7.3, so we cannot efficiently check the declarative specification at compile time. Instead, we need to be ready for potential inconsistencies at run time.

Figure 6 illustrates how one can detect conflicts on the property \mintinlinecypher—code— when creating a new \mintinlinecypher—City— node. We use the \mintinlinecypher—ON MATCH— subclause of the \mintinlinecypher—MERGE— clause to perform a comparison when we set a property for an existing node. Notice that a different rule could have led to the creation of this node and, consequently, \mintinlinecypher—z.code— may be empty; in this case the operator \mintinlinecypher—¡¿— returns \mintinlinecypher—false— and the correct specification is reached.

{minted}

[xleftmargin=2em, linenos=true, fontsize=, escapeinside=!!]cypher MERGE (z:_dummy _id: ”(” + a.cityName + ”)” ) ON CREATE SET z:City, z.code = a.cityCode ON MATCH SET z:City, z.code = CASE WHEN z.code ¡¿ a.cityCode THEN ”Conflict detected!” !ELSE! a.cityCode END

Figure 6. Detecting conflicts on the property \mintinlinecypher—code—.

9. Experiments

Our experimental study has three main objectives: (i) evaluate the benefits of using this formalism for transforming property graphs in practical use-cases over a large amount of data, (ii) evaluate the involved overhead of detecting potential inconsistencies at run-time, and (iii) compare with the native openCypher approach such as the one presented in Figure 1 (1ii).

Experimental setting. We have implemented our property graph transformations in openCypher 9 using a local Neo4j Community Edition instance in version 5.9.0. For monitoring the results and performing the database management tasks required in our methodology, we have used Python 3.11 and the official Neo4j Python Driver 5.9.0. The source code, datasets, and configuration files are available on the public GitHub repository of the project. We performed the experiments on an HP EliteBook 840 G3 with an Intel Core i7-6600U CPU and 32GiB of system memory (2133 MHz).

Datasets. Due to the lack of benchmarks for property graph transformations, in order to build realistic scenarios we have adapted the mappings from several relational data integration scenarios from the iBench suite (arocena_ibench_2015). In particular, we encode relational input instances as property graphs by creating a node for each tuple (no edges), and we let the target instances be property graphs as well, thus simulating graph-to-graph transformations. Each mapping in a scenario corresponds to a rule of our formalism. Following the method described in Section 8, we compute an openCypher script implementing each rule.

The middle part of Table 1 reports the number |in|subscript𝑖𝑛|\mathcal{L}_{in}|| caligraphic_L start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT | of input labels in each scenario (corresponding to the number of different relations in the original iBench scenario), the number |outnode|subscriptsuperscript𝑛𝑜𝑑𝑒𝑜𝑢𝑡|\mathcal{L}^{node}_{out}|| caligraphic_L start_POSTSUPERSCRIPT italic_n italic_o italic_d italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_u italic_t end_POSTSUBSCRIPT | of output node labels, the number |outedge|subscriptsuperscript𝑒𝑑𝑔𝑒𝑜𝑢𝑡|\mathcal{L}^{edge}_{out}|| caligraphic_L start_POSTSUPERSCRIPT italic_e italic_d italic_g italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_u italic_t end_POSTSUBSCRIPT | of output edge labels, and the number |𝒦|𝒦|\mathcal{K}|| caligraphic_K | of properties. The right part provides information about the number of rules in the scenario |T|𝑇|T|| italic_T | and the total number ncsubscript𝑛𝑐n_{c}italic_n start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT of content constructors. In each scenario, for each of the |in|subscript𝑖𝑛|\mathcal{L}_{in}|| caligraphic_L start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT | input node labels, we generated up to 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT nodes.

Table 1. Scenarios characteristics.
Labels / Properties Rules
Scenario |in|subscript𝑖𝑛|\mathcal{L}_{in}|| caligraphic_L start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT | |outnode|subscriptsuperscript𝑛𝑜𝑑𝑒𝑜𝑢𝑡|\mathcal{L}^{node}_{out}|| caligraphic_L start_POSTSUPERSCRIPT italic_n italic_o italic_d italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_u italic_t end_POSTSUBSCRIPT | |outedge|subscriptsuperscript𝑒𝑑𝑔𝑒𝑜𝑢𝑡|\mathcal{L}^{edge}_{out}|| caligraphic_L start_POSTSUPERSCRIPT italic_e italic_d italic_g italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o italic_u italic_t end_POSTSUBSCRIPT | |𝒦|𝒦|\mathcal{K}|| caligraphic_K | |T|𝑇|T|| italic_T | ncsubscript𝑛𝑐n_{c}italic_n start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT
PersonAddress 2 2 1 7 2 6
FlightHotel 2 3 2 5 1 7
PersonData 3 3 2 3 1 5
GUSToBIOSQL 7 5 4 80 8 18
DBLPToAmalgam1 7 5 4 140 10 22
Amalgam1ToAmalgam3 15 2 1 128 8 22
Refer to caption
Figure 7. Comparison between uniqueness constraints and indexes for computing T(G)𝑇𝐺T(G)italic_T ( italic_G ).
Refer to caption
Figure 8. Impact of indexing strategies and implementation variants on the computation of T(G)𝑇𝐺T(G)italic_T ( italic_G ).
Refer to caption
Figure 9. Average computation time for different orders of execution of the rules.
Refer to caption
Figure 10. Ratio between the time for computing T(G)𝑇𝐺T(G)italic_T ( italic_G ) with and without conflict detection (using PI_NI).

Methodology. The main abstraction in our implementation is a Scenario which describes an input property graph database that contains some data of a given size stored in specific node and relationship properties. As previously shown in Figure 1 (1i), given the iBench output, we create a node for each tuple, having as properties (key/value pairs) the columns names and column values. We also add the Cypher specification of a set of indexes and constraints on the output side, that are created before executing the transformation when the output data is still empty. This step is not time consuming and takes on average less than one millisecond per index.

A scenario includes several Cypher queries—one for each transformation rule—that are successively applied. To simulate the process of transforming one graph into another, and to distinguish between input and output data, we have used disjoint sets of labels in the input and output instance. Thus, a single database instance holds both input and output data at a time, but contains initially no output data. As a final step, a scenario is responsible for flushing the database and removing the indexes and constraints in order to have a fresh database instance before executing the next scenario. Note that the query cache (execution plans) is cleared when one of them is dropped. We monitored the total amount of time spend by Neo4j in applying the transformation rules. Each experiment generally represents the average taken over 5555 runs of a scenario.

Alternative implementation using separate indexes. In Section 8, we discussed an implementation of the framework, the Plain implementation (PI), which uses a single index on the output side to speed up the retrieval of already existing nodes by Cypher’s \mintinlinecypher—MERGE— clause. Using a single index for all nodes in the output may severely impact the performance of the implementation as the cost of index maintenance may become prohibitive. To quantify this, we compare with an alternative implementation, the Separate indexes implementation (SI), where the label is part of the argument list, similar to the case of relationships. The goal here is to mitigate the cost of maintaining a very large index by splitting the data into many smaller ones. Note that it is still possible to detect conflicts in this variant with a slight modification of the code from Figure 6.

Impact of indexes and uniqueness constraints. We start by comparing the advantages of using uniqueness constraints on nodes (NUC) and indexes (NI) on the two alternative implementations, NI and SI. Figure 7 reports the results for our 𝖥𝗅𝗂𝗀𝗁𝗍𝖧𝗈𝗍𝖾𝗅𝖥𝗅𝗂𝗀𝗁𝗍𝖧𝗈𝗍𝖾𝗅\mathsf{FlightHotel}sansserif_FlightHotel scenario, showing that for large input data, indexes tend to outperform UCs.

We next investigate the impact of using combinations of indexes on nodes and relationships. We compared variants with indexes on nodes and relationships (NI_RI), indexes on nodes only (NI), indexes on relationships only (RI), and without indexes (WI) for the previous PI and SI implementations and their respective variants with conflict detection enabled: Conflict Detection over Plain implementation (CD/PI), Conflict Detection over Separate indexes (CD/SI). We showcase in Figure 8, on a logarithmic scale, the results that were obtained for the 𝖣𝖡𝖫𝖯𝖳𝗈𝖠𝗆𝖺𝗅𝗀𝖺𝗆𝟣𝖣𝖡𝖫𝖯𝖳𝗈𝖠𝗆𝖺𝗅𝗀𝖺𝗆𝟣\mathsf{DBLPToAmalgam1}sansserif_DBLPToAmalgam1 scenario. Other scenarios show similar trends and they are reported in the appendix (TPG-github). It is clear from the figure that the choice of indexes to use is crucial. Using indexes only on nodes is more efficient than using a combination of indexes on nodes and relationships, which is in turn more efficient than using indexes only on relationships or using no index at all. The key reason of this behavior is that indexes on nodes already allow accessing the endpoints of edges, along with the edges themselves, efficiently. Additional indexes on edges do not help, but do incur additional overhead.

The positive point that emerges from this study is that the implementation does not require fine tuning to be efficient in a specific scenario; using indexes only on nodes is consistently the best approach to use. Additionally, when using indexes only on nodes (NI), the Plain implementation (PI) is negligibly slower than Separate indexes implementation (SI), whereas for other combinations of indexes it is noticeably slower. We discussed in Example 5.4 that PI allows for more flexible use of labels compared to the SI (which corresponds to having a dedicated Skolem function for each set of labels). In view of the above results, in the remaining experiments we focus on the Plain implementation with node indexes (PI_NI).

Impact of rule order. Our formalism is declarative and does not specify the order for the execution of the rules. Hence, we have investigated the impact of different orders on the computation time of the transformation. We compare the minimum, average and maximum running times using random orders with the (fixed) order provided in iBench as baseline. Figure 9 reports the results for the 𝖣𝖡𝖫𝖯𝖳𝗈𝖠𝗆𝖺𝗅𝗀𝖺𝗆𝟣𝖣𝖡𝖫𝖯𝖳𝗈𝖠𝗆𝖺𝗅𝗀𝖺𝗆𝟣\mathsf{DBLPToAmalgam1}sansserif_DBLPToAmalgam1 scenario; error bars indicate minimum and maximum computation times observed over 20202020 independent runs. For space reasons, 𝖦𝖴𝖲𝖳𝗈𝖡𝖨𝖮𝖲𝖰𝖫𝖦𝖴𝖲𝖳𝗈𝖡𝖨𝖮𝖲𝖰𝖫\mathsf{GUSToBIOSQL}sansserif_GUSToBIOSQL and 𝖠𝗆𝖺𝗅𝗀𝖺𝗆𝟣𝖳𝗈𝖠𝗆𝖺𝗅𝗀𝖺𝗆𝟥𝖠𝗆𝖺𝗅𝗀𝖺𝗆𝟣𝖳𝗈𝖠𝗆𝖺𝗅𝗀𝖺𝗆𝟥\mathsf{Amalgam1ToAmalgam3}sansserif_Amalgam1ToAmalgam3, exhibiting similar results, are deferred to the appendix (TPG-github).

We can observe that the impact of the order in which the rules are applied on the execution time of the transformation is not substantial; randomized orders have a variance similar to that of a fixed order. It is fair to say that the performance of our implementation does not rely on any specific execution order.

Overhead of detecting potential inconsistencies. We evaluated the impact of turning on conflict detection (over PI_NI) by investigating the ratio between computation time with and without conflict detection. The theoretical complexity of our implementation of Algorithm 1 with conflict detection is:

(5) 𝒪(tint+ncInt(G,T)(log|T(G)|+c))𝒪subscript𝑡𝑖𝑛𝑡subscript𝑛𝑐𝐼𝑛𝑡𝐺𝑇𝑇𝐺𝑐\mathcal{O}\left(t_{int}+n_{c}\cdot Int(G,T)\cdot\left(\log|T(G)|+c\right)\right)caligraphic_O ( italic_t start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ⋅ italic_I italic_n italic_t ( italic_G , italic_T ) ⋅ ( roman_log | italic_T ( italic_G ) | + italic_c ) )

for c𝑐citalic_c a constant modeling the cost of the conditional statement. Thus the overhead incurred by detecting conflicts is 1+clog|T(G)|1𝑐𝑇𝐺1+\frac{c}{\log|T(G)|}1 + divide start_ARG italic_c end_ARG start_ARG roman_log | italic_T ( italic_G ) | end_ARG, which tends to 1111 in larger scenarios.

The results presented in Figure 10 experimentally validate that the incurred overhead of conflict detection is reasonably low for large input instances, and stays within a constant factor, roughly between 1111 and 1.31.31.31.3, depending on the scenario.

Refer to caption
Figure 11. Run-time comparison for different likelihood of conflicts (using CD/PI_NI with 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT nodes for each input type).

Robustness against incidence of conflicts. iBench’s scenarios have very few or no conflicts. To investigate the generalizability of these results to more conflict-prone scenarios, we designed an experiment using an additional randomization step: when a rule attempts to set a value for an attribute, the value is changed randomly. This allows us to control the average number of conflicts in the output. Figure 11 reports, on a logarithmic scale, the results for all our scenarios, with varying likelihood of conflicts, ranging from 0%percent00\%0 % to 100%percent100100\%100 %. Note that, the size of the output is preserved because only the attributes are affected, not the topology of the graph.

We observe that the prevalence of conflicts has no impact on the execution time, suggesting that our framework’s stability is preserved, even with a large proportion of conflicts in the output.

Refer to caption
Refer to caption
Figure 12. Horizontal scaling, with varying number of independent copies of the scenario.

Horizontal scalability. We have investigated how well our framework scales with the number of rules and input labels. We built larger scenarios by taking an increasing number of independent copies of the scenarios from Table 1. The resulting transformations reach over one hundred rules and input labels, and over 1.5 million input nodes (in total). Figure 12 reports the results for the 𝖯𝖾𝗋𝗌𝗈𝗇𝖣𝖺𝗍𝖺𝖯𝖾𝗋𝗌𝗈𝗇𝖣𝖺𝗍𝖺\mathsf{PersonData}sansserif_PersonData and 𝖠𝗆𝖺𝗅𝗀𝖺𝗆𝟣𝖳𝗈𝖠𝗆𝖺𝗅𝗀𝖺𝗆𝟥𝖠𝗆𝖺𝗅𝗀𝖺𝗆𝟣𝖳𝗈𝖠𝗆𝖺𝗅𝗀𝖺𝗆𝟥\mathsf{Amalgam1ToAmalgam3}sansserif_Amalgam1ToAmalgam3 scenarios.

We observe the running time scales smoothly (almost linearly) as the number of copies increases. Results on other scenarios follow similar trends and are deferred to the appendix (TPG-github).

Improvement over handcrafted scripts: a user study. To compare empirically the readability and usability of the script-based approach and our framework, we ran an ad-hoc user study involving 12121212 participants that were all already familiar with openCypher.

We compared the ability of the participants to understand the behavior of some provided openCypher scripts and transformations in clearly defined scenarios. Only 25% of the participants have been able to fully understand the behaviour of the openCypher scripts, whereas 67% of them succeeded with transformations. In average, participants have scored 50% on openCypher scripts and 90% on our framework. Participants were also asked to compare openCypher scripts and our framework in terms of understandability, intuitiveness, and flexibility; they all have favored our framework by a great margin. For space reasons, the questionnaire, the participant’s answers, and the full discussion of the obtained results are deferred to the appendix (TPG-github).

{toappendix}

10. Experiments

User study. The User study consists of four parts, respectively aiming to:

  • Evaluate the Understandability of openCypher scripts; we asked four yes/no questions asking to the participants whether an assertion is true or not w.r.t. the behavior of a provided script in a concrete transformation scenario (e.g. Does this script create as many Director nodes as there are Person nodes that have an outgoing relationship of type DIRECTED to a Movie node?).

  • Evaluate the Understandability of Property Graph Transformations; with four similar yes/no questions on a slightly different transformation to avoid biases;

  • The third part required participants to modify some provided openCypher scripts and transformations to adapt to a new requirement. We collected the participants’ answers and checked them.

  • The last part required the participants to give their opinion on the following questions and to indicate, in a range for 1 to 5 (3 is neutral) whether they found openCypher scripts and/or transformation rules better on:

    • Which one of the two methods do you find easier to understand?

    • Which one of the two methods do you find more intuitive? (Better for describing the desired output.)

    • Which one of the two methods do you find more flexible? (Easier to adapt to a new specification.)

We collected the answers of 12 participants, that were asked to self report their level of expertise in a range from (1 - Novice) to (5 - Expert) on the following topics:

  • How would you rate your knowledge about databases?
    The answers filled in by the participants ranged from 3 to 5, included.

  • How would you rate your knowledge about openCypher?
    The answers filled in by the participants ranged from 2 to 4, included.

  • How would you rate your knowledge about the MERGE clause of openCypher?
    The answers filled in by the participants ranged from 1 to 5, included.

  • How would you rate your knowledge about property graph transformations?
    The answers filled in by the participants ranged from 1 to 3, included.

We have a pool of people that all have prior exposure to openCypher (2-4) but a great diversity w.r.t. the knowledge of the MERGE clause of openCypher (the basic tool for updates in openCypher), i.e. from novice to expert.

The results on the first two parts are as follow:

  • The average number of correct answers is 50% (2.0 out of 4) for the understandability of the openCypher scripts, and 90% (3.6 out of 4) for the understandability of the transformation rules.

  • 25.0%, resp. 67% of participants checked all the correct answers in the first, resp. second part.

  • All participants scored higher on their individual understanding of transformation rules compared to openCypher scripts.

Given those results, it is extremely clear that it is very difficult for people to understand even the basic openCypher scripts used to transform property graphs, whereas our framework – despite being absolutely new to the participants, has been widely understood.

The results on the last part are as follow (recall that 3 is neutral, 1 is strong preference for openCypher scripts and 5 is strong preference for transformation rules):

  • Which one of the two methods do you find easier to understand?
    Collected answers range from 1 to 5 with an average of 3.3.

  • Which one of the two methods do you find more intuitive? (Better for describing the desired output.)
    Collected answers range from 3 to 5 with an average of 3.8.

  • Which one of the two methods do you find more flexible? (Easier to adapt to a new specification.)
    Collected answers range from 3 to 5 with an average of 4.1.

Moreover, we have noticed that the participants having a low understanding of openCypher scripts (scored 2 or less out of 4 in the first part) have been more inclined to provide less credit to the transformation rules than other people. So we decided to split the participants in two groups to investigate this more.

The results on the last part are as follow only for the 8 people that have score 2 or lower out of 4 in their understanding of openCypher scripts (recall that 3 is neutral, 1 is strong preference for openCypher scripts and 5 is strong preference for transformation rules):

  • Which one of the two methods do you find easier to understand?
    Collected answers range from 1 to 5 with an average of 2.75.

  • Which one of the two methods do you find more intuitive? (Better for describing the desired output.)
    Collected answers range from 3 to 5 with an average of 3.9.

  • Which one of the two methods do you find more flexible? (Easier to adapt to a new specification.)
    Collected answers range from 3 to 5 with an average of 3.9.

The results on the last part are as follow only for the 4 people that have score 3 or higher out of 4 in their understanding of openCypher scripts (recall that 3 is neutral, 1 is strong preference for openCypher scripts and 5 is strong preference for transformation rules):

  • Which one of the two methods do you find easier to understand?
    Collected answers range from 1 to 5 with an average of 4.7.

  • Which one of the two methods do you find more intuitive? (Better for describing the desired output.)
    Collected answers range from 3 to 5 with an average of 3.7.

  • Which one of the two methods do you find more flexible? (Easier to adapt to a new specification.)
    Collected answers range from 3 to 5 with an average of 4.7.

It is therefore clear that those who have been less convinced that openCypher scripts are more error-prone and harder to interpret and analyze have not figured out for themselves that openCypher scripts are difficult to understand and manipulate. Moreover people with a good understanding of openCypher are clearly in favor that our framework is easier to understand.

With this study, we empirically and experimentally demonstrated that the script-based approach can be error prone, hard to interpret and analyze (i.e., less usable) and that the improvement of usability and accuracy over handcrafted, script-based solutions have clearly been attested by a majority of the participants.

Refer to caption
Refer to caption
Refer to caption
Figure 13. Run-time comparison with the baseline approach, depending on the number of input nodes of each type in G𝐺Gitalic_G.

Comparison with native Cypher approach. Finally, we compared our framework (using PI_NI) with ad-hoc transformation scripts (B-NI, B; respectively with and without node indexes), such as the one presented in Figure 1 (1ii). The result over 𝖯𝖾𝗋𝗌𝗈𝗇𝖠𝖽𝖽𝗋𝖾𝗌𝗌𝖯𝖾𝗋𝗌𝗈𝗇𝖠𝖽𝖽𝗋𝖾𝗌𝗌\mathsf{PersonAddress}sansserif_PersonAddress, 𝖥𝗅𝗂𝗀𝗁𝗍𝖧𝗈𝗍𝖾𝗅𝖥𝗅𝗂𝗀𝗁𝗍𝖧𝗈𝗍𝖾𝗅\mathsf{FlightHotel}sansserif_FlightHotel and 𝖯𝖾𝗋𝗌𝗈𝗇𝖣𝖺𝗍𝖺𝖯𝖾𝗋𝗌𝗈𝗇𝖣𝖺𝗍𝖺\mathsf{PersonData}sansserif_PersonData are presented in Figure 13. For larger scenarios, such as 𝖦𝖴𝖲𝖳𝗈𝖡𝖨𝖮𝖲𝖰𝖫𝖦𝖴𝖲𝖳𝗈𝖡𝖨𝖮𝖲𝖰𝖫\mathsf{GUSToBIOSQL}sansserif_GUSToBIOSQL, 𝖣𝖡𝖫𝖯𝖳𝗈𝖠𝗆𝖺𝗅𝗀𝖺𝗆𝟣𝖣𝖡𝖫𝖯𝖳𝗈𝖠𝗆𝖺𝗅𝗀𝖺𝗆𝟣\mathsf{DBLPToAmalgam1}sansserif_DBLPToAmalgam1 and 𝖠𝗆𝖺𝗅𝗀𝖺𝗆𝟣𝖳𝗈𝖠𝗆𝖺𝗅𝗀𝖺𝗆𝟥𝖠𝗆𝖺𝗅𝗀𝖺𝗆𝟣𝖳𝗈𝖠𝗆𝖺𝗅𝗀𝖺𝗆𝟥\mathsf{Amal}\-\mathsf{gam1ToAmalgam3}sansserif_Amalgam1ToAmalgam3, handcrafted transformation scripts are exceedingly large due to the number of rules and properties involved.

We can observe that our solution clearly outperforms the handcrafted solutions in most of the cases. The only exception occurs when using the 𝖯𝖾𝗋𝗌𝗈𝗇𝖣𝖺𝗍𝖺𝖯𝖾𝗋𝗌𝗈𝗇𝖣𝖺𝗍𝖺\mathsf{PersonData}sansserif_PersonData scenario, for which the B-NI baseline is slightly better than our solution, while the B baseline is still outperformed. The underlying reason is due to the nature of this scenario, for which the \mintinlinecypher—collect— clause contains only one element.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 14. Horizontal scaling, with varying number of independent copies of the scenario.
Refer to caption
Refer to caption
Figure 15. Average computation time for different orders of execution of the rules; error bars indicate minimum and maximum computation times observed over 20202020 independent runs.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 16. Impact of indexing strategies and implementation variants on the computation of T(G)𝑇𝐺T(G)italic_T ( italic_G ).
Table 2. Running times and size of intermediate data (ICIJ).
Rules tintsubscript𝑡𝑖𝑛𝑡t_{int}italic_t start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT t𝑡titalic_t Int(G,T)𝐼𝑛𝑡𝐺𝑇Int(G,T)italic_I italic_n italic_t ( italic_G , italic_T ) |T(G)|𝑇𝐺|T(G)|| italic_T ( italic_G ) | O/I𝑂𝐼O/Iitalic_O / italic_I tintesuperscriptsubscript𝑡𝑖𝑛𝑡𝑒t_{int}^{e}italic_t start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT tesuperscript𝑡𝑒t^{e}italic_t start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT
R1R4𝑅1𝑅4R1-R4italic_R 1 - italic_R 4 2,757 11,192 374,955 748,524 1.996 0.007 0.015
R5R9𝑅5𝑅9R5-R9italic_R 5 - italic_R 9 3,553 5,946 62,242 82,616 1.327 0.057 0.072
R10R13𝑅10𝑅13R10-R13italic_R 10 - italic_R 13 15,509 36,775 1,906,686 1,905,547 0.999 0.008 0.019
R14R17𝑅14𝑅17R14-R17italic_R 14 - italic_R 17 9,667 21,006 493,556 1,173,720 2.378 0.020 0.018
R18𝑅18R18italic_R 18 only 8,407 25,640 785,124 1,570,470 2.000 0.011 0.016

10.1. Use-case Study: Improving Data Integration

(o:𝖮𝖿𝖿𝗂𝖼𝖾𝗋):𝗌𝗂𝗆𝗂𝗅𝖺𝗋  0..(:𝖮𝖿𝖿𝗂𝖼𝖾𝗋):𝗋𝖾𝗀𝗂𝗌𝗍𝖾𝗋𝖾𝖽_𝖺𝖽𝖽𝗋𝖾𝗌𝗌  (:𝖠𝖽𝖽𝗋𝖾𝗌𝗌):𝗌𝗂𝗆𝗂𝗅𝖺𝗋  (:𝖠𝖽𝖽𝗋𝖾𝗌𝗌):𝗋𝖾𝗀𝗂𝗌𝗍𝖾𝗋𝖾𝖽_𝖺𝖽𝖽𝗋𝖾𝗌𝗌  (:𝖮𝖿𝖿𝗂𝖼𝖾𝗋):𝗌𝗂𝗆𝗂𝗅𝖺𝗋  0..(p:𝖮𝖿𝖿𝗂𝖼𝖾𝗋)toLower(o.name)=toLower(p.name)\displaystyle\underset{\langle toLower(o.name)=toLower(p.name)\rangle}{(o:% \mathsf{Officer})\underset{}{\leavevmode\hbox to32.65pt{\vbox to18.64pt{% \pgfpicture\makeatletter\hbox{\hskip 16.32596pt\lower-7.97891pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-12.92625pt}{3.28949pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle:\mathsf{% similar}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.5pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb% }{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-16.12596pt}{0.0pt}\pgfsys@lineto{15.66597pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{15.% 66597pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}^{0..\infty}(\>:\mathsf{Officer})\underset{}% {\leavevmode\hbox to68.61pt{\vbox to20.96pt{\pgfpicture\makeatletter\hbox{% \hskip 34.30347pt\lower-7.97891pt\hbox to0.0pt{\pgfsys@beginscope% \pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-30.90376pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle:\mathsf{% registered\_address}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.5pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb% }{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-34.10347pt}{0.0pt}\pgfsys@lineto{33.64348pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{33.% 64348pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}(\>:\mathsf{Address})\underset{}{\leavevmode% \hbox to32.65pt{\vbox to18.64pt{\pgfpicture\makeatletter\hbox{\hskip 16.32596% pt\lower-7.97891pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }% \definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}% \pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{\pgfsys@beginscope% \pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-12.92625pt}{3.28949pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle:\mathsf{% similar}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.5pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb% }{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-16.12596pt}{0.0pt}\pgfsys@lineto{15.66597pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{15.% 66597pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}(\>:\mathsf{Address})\underset{}{\leavevmode% \hbox to68.61pt{\vbox to20.96pt{\pgfpicture\makeatletter\hbox{\hskip 34.30347% pt\lower-7.97891pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }% \definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}% \pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{\pgfsys@beginscope% \pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-30.90376pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle:\mathsf{% registered\_address}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.5pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb% }{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-33.64348pt}{0.0pt}\pgfsys@lineto{34.10347pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-1.0}{0.0}{0.0}{-1.0}{-% 33.64348pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}(\>:\mathsf{Officer})\underset{}{\leavevmode% \hbox to32.65pt{\vbox to18.64pt{\pgfpicture\makeatletter\hbox{\hskip 16.32596% pt\lower-7.97891pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }% \definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}% \pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{\pgfsys@beginscope% \pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-12.92625pt}{3.28949pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle:\mathsf{% similar}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.5pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb% }{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-15.66597pt}{0.0pt}\pgfsys@lineto{16.12596pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{-1.0}{0.0}{0.0}{-1.0}{-% 15.66597pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}^{0..\infty}(p:\mathsf{Officer})}start_UNDERACCENT ⟨ italic_t italic_o italic_L italic_o italic_w italic_e italic_r ( italic_o . italic_n italic_a italic_m italic_e ) = italic_t italic_o italic_L italic_o italic_w italic_e italic_r ( italic_p . italic_n italic_a italic_m italic_e ) ⟩ end_UNDERACCENT start_ARG ( italic_o : sansserif_Officer ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_similar end_ARG start_POSTSUPERSCRIPT 0 . . ∞ end_POSTSUPERSCRIPT ( : sansserif_Officer ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_registered _ sansserif_address end_ARG ( : sansserif_Address ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_similar end_ARG ( : sansserif_Address ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_registered _ sansserif_address end_ARG ( : sansserif_Officer ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_similar end_ARG start_POSTSUPERSCRIPT 0 . . ∞ end_POSTSUPERSCRIPT ( italic_p : sansserif_Officer ) end_ARG
((o):𝖳_𝖮𝖿𝖿𝗂𝖼𝖾𝗋)():𝖳_𝖲𝗂𝗆𝗂𝗅𝖺𝗋 link="𝗌𝗂𝗆𝗂𝗅𝖺𝗋𝗇𝖺𝗆𝖾𝖺𝗇𝖽𝖺𝖽𝖽𝗋𝖾𝗌𝗌𝖺𝗌" ((p):𝖳_𝖮𝖿𝖿𝗂𝖼𝖾𝗋)absentabsent:𝑜𝖳_𝖮𝖿𝖿𝗂𝖼𝖾𝗋absent:𝖳_𝖲𝗂𝗆𝗂𝗅𝖺𝗋 delimited-⟨⟩𝑙𝑖𝑛𝑘"𝗌𝗂𝗆𝗂𝗅𝖺𝗋𝗇𝖺𝗆𝖾𝖺𝗇𝖽𝖺𝖽𝖽𝗋𝖾𝗌𝗌𝖺𝗌" absent:𝑝𝖳_𝖮𝖿𝖿𝗂𝖼𝖾𝗋\displaystyle\implies\underset{}{\left((o):\mathsf{T\_Officer}\right)}\>% \underset{}{\leavevmode\hbox to127.94pt{\vbox to25.96pt{\pgfpicture% \makeatletter\hbox{\hskip 63.96802pt\lower-12.97891pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-24.65875pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\mathit{(% \mathsf{})}\,:\,\mathsf{T\_Similar}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-60.56831pt}{-7.4792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\langle link% =\mathsf{"similar\ name\ and\ address\ as"}\rangle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-63.76802pt}{0.0pt}\pgfsys@lineto{63.30803pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{63.% 30803pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\underset{}{\left((p):\mathsf{T\_Officer}% \right)}⟹ start_UNDERACCENT end_UNDERACCENT start_ARG ( ( italic_o ) : sansserif_T _ sansserif_Officer ) end_ARG start_UNDERACCENT end_UNDERACCENT start_ARG ( ) : sansserif_T _ sansserif_Similar ⟨ italic_l italic_i italic_n italic_k = " sansserif_similar sansserif_name sansserif_and sansserif_address sansserif_as " ⟩ end_ARG start_UNDERACCENT end_UNDERACCENT start_ARG ( ( italic_p ) : sansserif_T _ sansserif_Officer ) end_ARG
Figure 17. Improved similarity detection (R15𝑅15R15italic_R 15).
{toappendix}

10.2. Offshore Leaks Dataset

(R1subscript𝑅1R_{1}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) (a:𝖠𝖽𝖽𝗋𝖾𝗌𝗌)sourceID="𝖬𝖺𝗅𝗍𝖺𝗋𝖾𝗀𝗂𝗌𝗍𝗋𝗒",a.address=a.address(x=(a):𝖳_𝖠𝖽𝖽𝗋𝖾𝗌𝗌)source=a.sourceIDdelimited-⟨⟩formulae-sequence𝑠𝑜𝑢𝑟𝑐𝑒𝐼𝐷"𝖬𝖺𝗅𝗍𝖺𝗋𝖾𝗀𝗂𝗌𝗍𝗋𝗒"𝑎𝑎𝑑𝑑𝑟𝑒𝑠𝑠𝑎𝑎𝑑𝑑𝑟𝑒𝑠𝑠:𝑎𝖠𝖽𝖽𝗋𝖾𝗌𝗌delimited-⟨⟩formulae-sequence𝑠𝑜𝑢𝑟𝑐𝑒𝑎𝑠𝑜𝑢𝑟𝑐𝑒𝐼𝐷:𝑥𝑎𝖳_𝖠𝖽𝖽𝗋𝖾𝗌𝗌\displaystyle\underset{\langle sourceID=\mathsf{"Malta\ registry"},\>a.address% =a.address\rangle}{(a:\mathsf{Address})}\implies\underset{\langle source=a.% sourceID\rangle}{\left(x=(a):\mathsf{T\_Address}\right)}start_UNDERACCENT ⟨ italic_s italic_o italic_u italic_r italic_c italic_e italic_I italic_D = " sansserif_Malta sansserif_registry " , italic_a . italic_a italic_d italic_d italic_r italic_e italic_s italic_s = italic_a . italic_a italic_d italic_d italic_r italic_e italic_s italic_s ⟩ end_UNDERACCENT start_ARG ( italic_a : sansserif_Address ) end_ARG ⟹ start_UNDERACCENT ⟨ italic_s italic_o italic_u italic_r italic_c italic_e = italic_a . italic_s italic_o italic_u italic_r italic_c italic_e italic_I italic_D ⟩ end_UNDERACCENT start_ARG ( italic_x = ( italic_a ) : sansserif_T _ sansserif_Address ) end_ARG :𝖳_𝖫𝖮𝖢𝖠𝖳𝖤𝖣  (y=(a.country_code):𝖳_𝖢𝗈𝗎𝗇𝗍𝗋𝗒)name=a.country\displaystyle\>\underset{}{\leavevmode\hbox to68.9pt{\vbox to21.62pt{% \pgfpicture\makeatletter\hbox{\hskip 34.44818pt\lower-8.31221pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-30.71518pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{T\_LOCATED}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-34.24818pt}{0.0pt}\pgfsys@lineto{33.7882pt}{0.% 0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{33.% 7882pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\underset{\langle name=a.country\rangle}{% \left(y=(a.country\_code):\mathsf{T\_Country}\right)}start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_T _ sansserif_LOCATED end_ARG start_UNDERACCENT ⟨ italic_n italic_a italic_m italic_e = italic_a . italic_c italic_o italic_u italic_n italic_t italic_r italic_y ⟩ end_UNDERACCENT start_ARG ( italic_y = ( italic_a . italic_c italic_o italic_u italic_n italic_t italic_r italic_y _ italic_c italic_o italic_d italic_e ) : sansserif_T _ sansserif_Country ) end_ARG
(R2subscript𝑅2R_{2}italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) (a:𝖠𝖽𝖽𝗋𝖾𝗌𝗌)sourceID="𝖬𝖺𝗅𝗍𝖺𝗋𝖾𝗀𝗂𝗌𝗍𝗋𝗒",¬(a.address=a.address)(x=(a):𝖳_𝖠𝖽𝖽𝗋𝖾𝗌𝗌)source=a.sourceID\displaystyle\underset{\langle sourceID=\mathsf{"Malta\ registry"},\>\neg\left% (a.address=a.address\right)\rangle}{(a:\mathsf{Address})}\implies\underset{% \langle source=a.sourceID\rangle}{\left(x=(a):\mathsf{T\_Address}\right)}start_UNDERACCENT ⟨ italic_s italic_o italic_u italic_r italic_c italic_e italic_I italic_D = " sansserif_Malta sansserif_registry " , ¬ ( italic_a . italic_a italic_d italic_d italic_r italic_e italic_s italic_s = italic_a . italic_a italic_d italic_d italic_r italic_e italic_s italic_s ) ⟩ end_UNDERACCENT start_ARG ( italic_a : sansserif_Address ) end_ARG ⟹ start_UNDERACCENT ⟨ italic_s italic_o italic_u italic_r italic_c italic_e = italic_a . italic_s italic_o italic_u italic_r italic_c italic_e italic_I italic_D ⟩ end_UNDERACCENT start_ARG ( italic_x = ( italic_a ) : sansserif_T _ sansserif_Address ) end_ARG
(R3subscript𝑅3R_{3}italic_R start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT) (a:𝖠𝖽𝖽𝗋𝖾𝗌𝗌)¬(sourceID="𝖬𝖺𝗅𝗍𝖺𝗋𝖾𝗀𝗂𝗌𝗍𝗋𝗒"),a.address=a.address(x=(a):𝖳_𝖠𝖽𝖽𝗋𝖾𝗌𝗌)source=a.sourceIDdelimited-⟨⟩formulae-sequence𝑠𝑜𝑢𝑟𝑐𝑒𝐼𝐷"𝖬𝖺𝗅𝗍𝖺𝗋𝖾𝗀𝗂𝗌𝗍𝗋𝗒"𝑎𝑎𝑑𝑑𝑟𝑒𝑠𝑠𝑎𝑎𝑑𝑑𝑟𝑒𝑠𝑠:𝑎𝖠𝖽𝖽𝗋𝖾𝗌𝗌delimited-⟨⟩formulae-sequence𝑠𝑜𝑢𝑟𝑐𝑒𝑎𝑠𝑜𝑢𝑟𝑐𝑒𝐼𝐷:𝑥𝑎𝖳_𝖠𝖽𝖽𝗋𝖾𝗌𝗌\displaystyle\underset{\langle\neg\left(sourceID=\mathsf{"Malta\ registry"}% \right),\>a.address=a.address\rangle}{(a:\mathsf{Address})}\implies\underset{% \langle source=a.sourceID\rangle}{\left(x=(a):\mathsf{T\_Address}\right)}start_UNDERACCENT ⟨ ¬ ( italic_s italic_o italic_u italic_r italic_c italic_e italic_I italic_D = " sansserif_Malta sansserif_registry " ) , italic_a . italic_a italic_d italic_d italic_r italic_e italic_s italic_s = italic_a . italic_a italic_d italic_d italic_r italic_e italic_s italic_s ⟩ end_UNDERACCENT start_ARG ( italic_a : sansserif_Address ) end_ARG ⟹ start_UNDERACCENT ⟨ italic_s italic_o italic_u italic_r italic_c italic_e = italic_a . italic_s italic_o italic_u italic_r italic_c italic_e italic_I italic_D ⟩ end_UNDERACCENT start_ARG ( italic_x = ( italic_a ) : sansserif_T _ sansserif_Address ) end_ARG :𝖳_𝖫𝖮𝖢𝖠𝖳𝖤𝖣  (y=(a.country_codes):𝖳_𝖢𝗈𝗎𝗇𝗍𝗋𝗒)name=a.country\displaystyle\>\underset{}{\leavevmode\hbox to68.9pt{\vbox to21.62pt{% \pgfpicture\makeatletter\hbox{\hskip 34.44818pt\lower-8.31221pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-30.71518pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{T\_LOCATED}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-34.24818pt}{0.0pt}\pgfsys@lineto{33.7882pt}{0.% 0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{33.% 7882pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\underset{\langle name=a.country\rangle}{% \left(y=(a.country\_codes):\mathsf{T\_Country}\right)}start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_T _ sansserif_LOCATED end_ARG start_UNDERACCENT ⟨ italic_n italic_a italic_m italic_e = italic_a . italic_c italic_o italic_u italic_n italic_t italic_r italic_y ⟩ end_UNDERACCENT start_ARG ( italic_y = ( italic_a . italic_c italic_o italic_u italic_n italic_t italic_r italic_y _ italic_c italic_o italic_d italic_e italic_s ) : sansserif_T _ sansserif_Country ) end_ARG
(R4subscript𝑅4R_{4}italic_R start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT) (a:𝖠𝖽𝖽𝗋𝖾𝗌𝗌)¬(sourceID="𝖬𝖺𝗅𝗍𝖺𝗋𝖾𝗀𝗂𝗌𝗍𝗋𝗒"),¬(a.address=a.address)(x=(a):𝖳_𝖠𝖽𝖽𝗋𝖾𝗌𝗌)source=a.sourceID\displaystyle\underset{\langle\neg\left(sourceID=\mathsf{"Malta\ registry"}% \right),\>\neg\left(a.address=a.address\right)\rangle}{(a:\mathsf{Address})}% \implies\underset{\langle source=a.sourceID\rangle}{\left(x=(a):\mathsf{T\_% Address}\right)}start_UNDERACCENT ⟨ ¬ ( italic_s italic_o italic_u italic_r italic_c italic_e italic_I italic_D = " sansserif_Malta sansserif_registry " ) , ¬ ( italic_a . italic_a italic_d italic_d italic_r italic_e italic_s italic_s = italic_a . italic_a italic_d italic_d italic_r italic_e italic_s italic_s ) ⟩ end_UNDERACCENT start_ARG ( italic_a : sansserif_Address ) end_ARG ⟹ start_UNDERACCENT ⟨ italic_s italic_o italic_u italic_r italic_c italic_e = italic_a . italic_s italic_o italic_u italic_r italic_c italic_e italic_I italic_D ⟩ end_UNDERACCENT start_ARG ( italic_x = ( italic_a ) : sansserif_T _ sansserif_Address ) end_ARG
i Refactoring registered addresses (R1R4𝑅1𝑅4R1-R4italic_R 1 - italic_R 4).
(R5subscript𝑅5R_{5}italic_R start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT) (i:𝖨𝗇𝗍𝖾𝗋𝗆𝖾𝖽𝗂𝖺𝗋𝗒):𝗋𝖾𝗀𝗂𝗌𝗍𝖾𝗋𝖾𝖽_𝖺𝖽𝖽𝗋𝖾𝗌𝗌  (a:𝖠𝖽𝖽𝗋𝖾𝗌𝗌)\displaystyle(i:\mathsf{Intermediary})\>\underset{}{\leavevmode\hbox to81.42pt% {\vbox to21.62pt{\pgfpicture\makeatletter\hbox{\hskip 40.7093pt\lower-8.31221% pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}% }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-36.9763pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{registered\_address}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-40.50931pt}{0.0pt}\pgfsys@lineto{40.04932pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{40.% 04932pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>(a:\mathsf{Address})\implies( italic_i : sansserif_Intermediary ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_registered _ sansserif_address end_ARG ( italic_a : sansserif_Address ) ⟹ (x=(i):𝖳_𝖨𝗇𝗍𝖾𝗋𝗆𝖾𝖽𝗂𝖺𝗋𝗒):𝖳_𝖱𝖤𝖦_𝖠𝖣𝖣𝖱𝖤𝖲𝖲  (y=(a):𝖳_𝖠𝖽𝖽𝗋𝖾𝗌𝗌)\displaystyle(x=(i):\mathsf{T\_Intermediary})\>\underset{}{\leavevmode\hbox to% 90.74pt{\vbox to21.62pt{\pgfpicture\makeatletter\hbox{\hskip 45.36835pt\lower-% 8.31221pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}% }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-41.63535pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{T\_REG\_ADDRESS}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-45.16835pt}{0.0pt}\pgfsys@lineto{44.70836pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{44.% 70836pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>(y=(a):\mathsf{T\_Address})( italic_x = ( italic_i ) : sansserif_T _ sansserif_Intermediary ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_T _ sansserif_REG _ sansserif_ADDRESS end_ARG ( italic_y = ( italic_a ) : sansserif_T _ sansserif_Address )
(i:𝖨𝗇𝗍𝖾𝗋𝗆𝖾𝖽𝗂𝖺𝗋𝗒)¬(address=""),address=address,country_code=country_codedelimited-⟨⟩formulae-sequence𝑎𝑑𝑑𝑟𝑒𝑠𝑠""𝑎𝑑𝑑𝑟𝑒𝑠𝑠𝑎𝑑𝑑𝑟𝑒𝑠𝑠𝑐𝑜𝑢𝑛𝑡𝑟𝑦_𝑐𝑜𝑑𝑒𝑐𝑜𝑢𝑛𝑡𝑟𝑦_𝑐𝑜𝑑𝑒:𝑖𝖨𝗇𝗍𝖾𝗋𝗆𝖾𝖽𝗂𝖺𝗋𝗒absent\displaystyle\underset{\langle\neg\left(address=\mathsf{""}\right),\>address=% address,\>country\_code=country\_code\rangle}{(i:\mathsf{Intermediary})}\impliesstart_UNDERACCENT ⟨ ¬ ( italic_a italic_d italic_d italic_r italic_e italic_s italic_s = " " ) , italic_a italic_d italic_d italic_r italic_e italic_s italic_s = italic_a italic_d italic_d italic_r italic_e italic_s italic_s , italic_c italic_o italic_u italic_n italic_t italic_r italic_y _ italic_c italic_o italic_d italic_e = italic_c italic_o italic_u italic_n italic_t italic_r italic_y _ italic_c italic_o italic_d italic_e ⟩ end_UNDERACCENT start_ARG ( italic_i : sansserif_Intermediary ) end_ARG ⟹ (x=(i):𝖳_𝖨𝗇𝗍𝖾𝗋𝗆𝖾𝖽𝗂𝖺𝗋𝗒):𝖳_𝖱𝖤𝖦_𝖠𝖣𝖣𝖱𝖤𝖲𝖲  (y=(i.address):𝖳_𝖠𝖽𝖽𝗋𝖾𝗌𝗌)source=i.sourceID,\displaystyle(x=(i):\mathsf{T\_Intermediary})\>\underset{}{\leavevmode\hbox to% 90.74pt{\vbox to21.62pt{\pgfpicture\makeatletter\hbox{\hskip 45.36835pt\lower-% 8.31221pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}% }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-41.63535pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{T\_REG\_ADDRESS}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-45.16835pt}{0.0pt}\pgfsys@lineto{44.70836pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{44.% 70836pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\underset{\langle source=i.sourceID\rangle% }{(y=(i.address):\mathsf{T\_Address})},( italic_x = ( italic_i ) : sansserif_T _ sansserif_Intermediary ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_T _ sansserif_REG _ sansserif_ADDRESS end_ARG start_UNDERACCENT ⟨ italic_s italic_o italic_u italic_r italic_c italic_e = italic_i . italic_s italic_o italic_u italic_r italic_c italic_e italic_I italic_D ⟩ end_UNDERACCENT start_ARG ( italic_y = ( italic_i . italic_a italic_d italic_d italic_r italic_e italic_s italic_s ) : sansserif_T _ sansserif_Address ) end_ARG ,
(R6subscript𝑅6R_{6}italic_R start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT) (y):𝖳_𝖫𝖮𝖢𝖠𝖳𝖤𝖣  (z=(i.country_codes):𝖳_𝖢𝗈𝗎𝗇𝗍𝗋𝗒)name=i.countries\displaystyle\left(y\right)\>\underset{}{\leavevmode\hbox to68.9pt{\vbox to% 21.62pt{\pgfpicture\makeatletter\hbox{\hskip 34.44818pt\lower-8.31221pt\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill% {0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }% \nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{% {\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-30.71518pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{T\_LOCATED}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-34.24818pt}{0.0pt}\pgfsys@lineto{33.7882pt}{0.% 0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{33.% 7882pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\underset{\langle name=i.countries\rangle}% {\left(z=(i.country\_codes):\mathsf{T\_Country}\right)}( italic_y ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_T _ sansserif_LOCATED end_ARG start_UNDERACCENT ⟨ italic_n italic_a italic_m italic_e = italic_i . italic_c italic_o italic_u italic_n italic_t italic_r italic_i italic_e italic_s ⟩ end_UNDERACCENT start_ARG ( italic_z = ( italic_i . italic_c italic_o italic_u italic_n italic_t italic_r italic_y _ italic_c italic_o italic_d italic_e italic_s ) : sansserif_T _ sansserif_Country ) end_ARG
(R7subscript𝑅7R_{7}italic_R start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT) (i:𝖨𝗇𝗍𝖾𝗋𝗆𝖾𝖽𝗂𝖺𝗋𝗒)¬(address=""),address=address,¬(country_code=country_code)delimited-⟨⟩formulae-sequence𝑎𝑑𝑑𝑟𝑒𝑠𝑠""𝑎𝑑𝑑𝑟𝑒𝑠𝑠𝑎𝑑𝑑𝑟𝑒𝑠𝑠𝑐𝑜𝑢𝑛𝑡𝑟𝑦_𝑐𝑜𝑑𝑒𝑐𝑜𝑢𝑛𝑡𝑟𝑦_𝑐𝑜𝑑𝑒:𝑖𝖨𝗇𝗍𝖾𝗋𝗆𝖾𝖽𝗂𝖺𝗋𝗒absent\displaystyle\underset{\langle\neg\left(address=\mathsf{""}\right),\>address=% address,\>\neg\left(country\_code=country\_code\right)\rangle}{(i:\mathsf{% Intermediary})}\impliesstart_UNDERACCENT ⟨ ¬ ( italic_a italic_d italic_d italic_r italic_e italic_s italic_s = " " ) , italic_a italic_d italic_d italic_r italic_e italic_s italic_s = italic_a italic_d italic_d italic_r italic_e italic_s italic_s , ¬ ( italic_c italic_o italic_u italic_n italic_t italic_r italic_y _ italic_c italic_o italic_d italic_e = italic_c italic_o italic_u italic_n italic_t italic_r italic_y _ italic_c italic_o italic_d italic_e ) ⟩ end_UNDERACCENT start_ARG ( italic_i : sansserif_Intermediary ) end_ARG ⟹ (x=(i):𝖳_𝖨𝗇𝗍𝖾𝗋𝗆𝖾𝖽𝗂𝖺𝗋𝗒):𝖳_𝖱𝖤𝖦_𝖠𝖣𝖣𝖱𝖤𝖲𝖲  (y=(i.address):𝖳_𝖠𝖽𝖽𝗋𝖾𝗌𝗌)source=i.sourceID\displaystyle(x=(i):\mathsf{T\_Intermediary})\>\underset{}{\leavevmode\hbox to% 90.74pt{\vbox to21.62pt{\pgfpicture\makeatletter\hbox{\hskip 45.36835pt\lower-% 8.31221pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}% }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-41.63535pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{T\_REG\_ADDRESS}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-45.16835pt}{0.0pt}\pgfsys@lineto{44.70836pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{44.% 70836pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\underset{\langle source=i.sourceID\rangle% }{(y=(i.address):\mathsf{T\_Address})}( italic_x = ( italic_i ) : sansserif_T _ sansserif_Intermediary ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_T _ sansserif_REG _ sansserif_ADDRESS end_ARG start_UNDERACCENT ⟨ italic_s italic_o italic_u italic_r italic_c italic_e = italic_i . italic_s italic_o italic_u italic_r italic_c italic_e italic_I italic_D ⟩ end_UNDERACCENT start_ARG ( italic_y = ( italic_i . italic_a italic_d italic_d italic_r italic_e italic_s italic_s ) : sansserif_T _ sansserif_Address ) end_ARG
(R8subscript𝑅8R_{8}italic_R start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT) (i:𝖨𝗇𝗍𝖾𝗋𝗆𝖾𝖽𝗂𝖺𝗋𝗒):𝗂𝗇𝗍𝖾𝗋𝗆𝖾𝖽𝗂𝖺𝗋𝗒_𝗈𝖿  (e:𝖤𝗇𝗍𝗂𝗍𝗒)¬(address=""),address=address,¬(country_code=country_code)\displaystyle\underset{\langle\neg\left(address=\mathsf{""}\right),\>address=% address,\>\neg\left(country\_code=country\_code\right)\rangle}{(i:\mathsf{% Intermediary})\>\underset{}{\leavevmode\hbox to73.7pt{\vbox to21.62pt{% \pgfpicture\makeatletter\hbox{\hskip 36.8496pt\lower-8.31221pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-33.1166pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{intermediary\_of}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-36.6496pt}{0.0pt}\pgfsys@lineto{36.1896pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{36.1896pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>(e:\mathsf{Entity})}\impliesstart_UNDERACCENT ⟨ ¬ ( italic_a italic_d italic_d italic_r italic_e italic_s italic_s = " " ) , italic_a italic_d italic_d italic_r italic_e italic_s italic_s = italic_a italic_d italic_d italic_r italic_e italic_s italic_s , ¬ ( italic_c italic_o italic_u italic_n italic_t italic_r italic_y _ italic_c italic_o italic_d italic_e = italic_c italic_o italic_u italic_n italic_t italic_r italic_y _ italic_c italic_o italic_d italic_e ) ⟩ end_UNDERACCENT start_ARG ( italic_i : sansserif_Intermediary ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_intermediary _ sansserif_of end_ARG ( italic_e : sansserif_Entity ) end_ARG ⟹ (x=(i):𝖳_𝖨𝗇𝗍𝖾𝗋𝗆𝖾𝖽𝗂𝖺𝗋𝗒):𝖳_𝖱𝖤𝖦_𝖠𝖣𝖣𝖱𝖤𝖲𝖲  (y=(e.address):𝖳_𝖠𝖽𝖽𝗋𝖾𝗌𝗌)source=i.sourceID\displaystyle(x=(i):\mathsf{T\_Intermediary})\>\underset{}{\leavevmode\hbox to% 90.74pt{\vbox to21.62pt{\pgfpicture\makeatletter\hbox{\hskip 45.36835pt\lower-% 8.31221pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}% }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-41.63535pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{T\_REG\_ADDRESS}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-45.16835pt}{0.0pt}\pgfsys@lineto{44.70836pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{44.% 70836pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\underset{\langle source=i.sourceID\rangle% }{(y=(e.address):\mathsf{T\_Address})}( italic_x = ( italic_i ) : sansserif_T _ sansserif_Intermediary ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_T _ sansserif_REG _ sansserif_ADDRESS end_ARG start_UNDERACCENT ⟨ italic_s italic_o italic_u italic_r italic_c italic_e = italic_i . italic_s italic_o italic_u italic_r italic_c italic_e italic_I italic_D ⟩ end_UNDERACCENT start_ARG ( italic_y = ( italic_e . italic_a italic_d italic_d italic_r italic_e italic_s italic_s ) : sansserif_T _ sansserif_Address ) end_ARG
(i:𝖨𝗇𝗍𝖾𝗋𝗆𝖾𝖽𝗂𝖺𝗋𝗒):𝗂𝗇𝗍𝖾𝗋𝗆𝖾𝖽𝗂𝖺𝗋𝗒_𝗈𝖿  (e:𝖤𝗇𝗍𝗂𝗍𝗒)¬(address=""),address=address,country_code=country_code\displaystyle\underset{\langle\neg\left(address=\mathsf{""}\right),\>address=% address,\>country\_code=country\_code\rangle}{(i:\mathsf{Intermediary})\>% \underset{}{\leavevmode\hbox to73.7pt{\vbox to21.62pt{\pgfpicture\makeatletter% \hbox{\hskip 36.8496pt\lower-8.31221pt\hbox to0.0pt{\pgfsys@beginscope% \pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{{% \pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-33.1166pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{intermediary\_of}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-36.6496pt}{0.0pt}\pgfsys@lineto{36.1896pt}{0.0% pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{\pgfsys@beginscope% \pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{36.1896pt}{0.0pt}% \pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }\pgfsys@invoke{% \lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>(e:\mathsf{Entity})}\impliesstart_UNDERACCENT ⟨ ¬ ( italic_a italic_d italic_d italic_r italic_e italic_s italic_s = " " ) , italic_a italic_d italic_d italic_r italic_e italic_s italic_s = italic_a italic_d italic_d italic_r italic_e italic_s italic_s , italic_c italic_o italic_u italic_n italic_t italic_r italic_y _ italic_c italic_o italic_d italic_e = italic_c italic_o italic_u italic_n italic_t italic_r italic_y _ italic_c italic_o italic_d italic_e ⟩ end_UNDERACCENT start_ARG ( italic_i : sansserif_Intermediary ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_intermediary _ sansserif_of end_ARG ( italic_e : sansserif_Entity ) end_ARG ⟹ (x=(i):𝖳_𝖨𝗇𝗍𝖾𝗋𝗆𝖾𝖽𝗂𝖺𝗋𝗒):𝖳_𝖱𝖤𝖦_𝖠𝖣𝖣𝖱𝖤𝖲𝖲  (y=(e.address):𝖳_𝖠𝖽𝖽𝗋𝖾𝗌𝗌)source=i.sourceID,\displaystyle(x=(i):\mathsf{T\_Intermediary})\>\underset{}{\leavevmode\hbox to% 90.74pt{\vbox to21.62pt{\pgfpicture\makeatletter\hbox{\hskip 45.36835pt\lower-% 8.31221pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}% }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-41.63535pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{T\_REG\_ADDRESS}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-45.16835pt}{0.0pt}\pgfsys@lineto{44.70836pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{44.% 70836pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\underset{\langle source=i.sourceID\rangle% }{(y=(e.address):\mathsf{T\_Address})},( italic_x = ( italic_i ) : sansserif_T _ sansserif_Intermediary ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_T _ sansserif_REG _ sansserif_ADDRESS end_ARG start_UNDERACCENT ⟨ italic_s italic_o italic_u italic_r italic_c italic_e = italic_i . italic_s italic_o italic_u italic_r italic_c italic_e italic_I italic_D ⟩ end_UNDERACCENT start_ARG ( italic_y = ( italic_e . italic_a italic_d italic_d italic_r italic_e italic_s italic_s ) : sansserif_T _ sansserif_Address ) end_ARG ,
(R9subscript𝑅9R_{9}italic_R start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT) (y):𝖳_𝖫𝖮𝖢𝖠𝖳𝖤𝖣  (z=(e.country_codes):𝖳_𝖢𝗈𝗎𝗇𝗍𝗋𝗒)name=i.countries\displaystyle\left(y\right)\>\underset{}{\leavevmode\hbox to68.9pt{\vbox to% 21.62pt{\pgfpicture\makeatletter\hbox{\hskip 34.44818pt\lower-8.31221pt\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill% {0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }% \nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{% {\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-30.71518pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{T\_LOCATED}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-34.24818pt}{0.0pt}\pgfsys@lineto{33.7882pt}{0.% 0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{33.% 7882pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\underset{\langle name=i.countries\rangle}% {\left(z=(e.country\_codes):\mathsf{T\_Country}\right)}( italic_y ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_T _ sansserif_LOCATED end_ARG start_UNDERACCENT ⟨ italic_n italic_a italic_m italic_e = italic_i . italic_c italic_o italic_u italic_n italic_t italic_r italic_i italic_e italic_s ⟩ end_UNDERACCENT start_ARG ( italic_z = ( italic_e . italic_c italic_o italic_u italic_n italic_t italic_r italic_y _ italic_c italic_o italic_d italic_e italic_s ) : sansserif_T _ sansserif_Country ) end_ARG
ii Uniformizing address information for intermediaries (R5R9𝑅5𝑅9R5-R9italic_R 5 - italic_R 9).
(R10subscript𝑅10R_{10}italic_R start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT) (i:𝖨𝗇𝗍𝖾𝗋𝗆𝖾𝖽𝗂𝖺𝗋𝗒)\displaystyle(i:\mathsf{Intermediary})\implies( italic_i : sansserif_Intermediary ) ⟹ (x=(i):𝖳_𝖨𝗇𝗍𝖾𝗋𝗆𝖾𝖽𝗂𝖺𝗋𝗒)name=i.name,status=i.status,valid_until=i.valid_until,source=i.sourceIDdelimited-⟨⟩formulae-sequence𝑛𝑎𝑚𝑒𝑖𝑛𝑎𝑚𝑒𝑠𝑡𝑎𝑡𝑢𝑠𝑖𝑠𝑡𝑎𝑡𝑢𝑠𝑣𝑎𝑙𝑖𝑑_𝑢𝑛𝑡𝑖𝑙𝑖𝑣𝑎𝑙𝑖𝑑_𝑢𝑛𝑡𝑖𝑙𝑠𝑜𝑢𝑟𝑐𝑒𝑖𝑠𝑜𝑢𝑟𝑐𝑒𝐼𝐷:𝑥𝑖𝖳_𝖨𝗇𝗍𝖾𝗋𝗆𝖾𝖽𝗂𝖺𝗋𝗒\displaystyle\underset{\langle name=i.name,\>status=i.status,\>valid\_until=i.% valid\_until,\>source=i.sourceID\rangle}{\left(x=(i):\mathsf{T\_Intermediary}% \right)}start_UNDERACCENT ⟨ italic_n italic_a italic_m italic_e = italic_i . italic_n italic_a italic_m italic_e , italic_s italic_t italic_a italic_t italic_u italic_s = italic_i . italic_s italic_t italic_a italic_t italic_u italic_s , italic_v italic_a italic_l italic_i italic_d _ italic_u italic_n italic_t italic_i italic_l = italic_i . italic_v italic_a italic_l italic_i italic_d _ italic_u italic_n italic_t italic_i italic_l , italic_s italic_o italic_u italic_r italic_c italic_e = italic_i . italic_s italic_o italic_u italic_r italic_c italic_e italic_I italic_D ⟩ end_UNDERACCENT start_ARG ( italic_x = ( italic_i ) : sansserif_T _ sansserif_Intermediary ) end_ARG
(R11subscript𝑅11R_{11}italic_R start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT) (a:𝖠𝖽𝖽𝗋𝖾𝗌𝗌)\displaystyle(a:\mathsf{Address})\implies( italic_a : sansserif_Address ) ⟹ (x=(a):𝖳_𝖠𝖽𝖽𝗋𝖾𝗌𝗌)address=a.address,orig_addr=a.orig_addr,valid_until=a.valid_untildelimited-⟨⟩formulae-sequence𝑎𝑑𝑑𝑟𝑒𝑠𝑠𝑎𝑎𝑑𝑑𝑟𝑒𝑠𝑠𝑜𝑟𝑖𝑔_𝑎𝑑𝑑𝑟𝑎𝑜𝑟𝑖𝑔_𝑎𝑑𝑑𝑟𝑣𝑎𝑙𝑖subscript𝑑_𝑢𝑛𝑡𝑖𝑙𝑎𝑣𝑎𝑙𝑖subscript𝑑_𝑢𝑛𝑡𝑖𝑙:𝑥𝑎𝖳_𝖠𝖽𝖽𝗋𝖾𝗌𝗌\displaystyle\underset{\langle address=a.address,\>orig\_addr=a.orig\_addr,\>% valid_{\_}until=a.valid_{\_}until\rangle}{\left(x=(a):\mathsf{T\_Address}% \right)}start_UNDERACCENT ⟨ italic_a italic_d italic_d italic_r italic_e italic_s italic_s = italic_a . italic_a italic_d italic_d italic_r italic_e italic_s italic_s , italic_o italic_r italic_i italic_g _ italic_a italic_d italic_d italic_r = italic_a . italic_o italic_r italic_i italic_g _ italic_a italic_d italic_d italic_r , italic_v italic_a italic_l italic_i italic_d start_POSTSUBSCRIPT _ end_POSTSUBSCRIPT italic_u italic_n italic_t italic_i italic_l = italic_a . italic_v italic_a italic_l italic_i italic_d start_POSTSUBSCRIPT _ end_POSTSUBSCRIPT italic_u italic_n italic_t italic_i italic_l ⟩ end_UNDERACCENT start_ARG ( italic_x = ( italic_a ) : sansserif_T _ sansserif_Address ) end_ARG
(R12subscript𝑅12R_{12}italic_R start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT) (e:𝖤𝗇𝗍𝗂𝗍𝗒)\displaystyle(e:\mathsf{Entity})\implies( italic_e : sansserif_Entity ) ⟹ (x=(e):𝖳_𝖤𝗇𝗍𝗂𝗍𝗒)name=e.name,orig_name=e.orig_name,inact_date=e.inact_date,inc_date=e.inc_date,,source=e.sourceIDdelimited-⟨⟩formulae-sequence𝑛𝑎𝑚𝑒𝑒𝑛𝑎𝑚𝑒𝑜𝑟𝑖𝑔_𝑛𝑎𝑚𝑒𝑒𝑜𝑟𝑖𝑔_𝑛𝑎𝑚𝑒𝑖𝑛𝑎𝑐𝑡_𝑑𝑎𝑡𝑒𝑒𝑖𝑛𝑎𝑐𝑡_𝑑𝑎𝑡𝑒𝑖𝑛𝑐_𝑑𝑎𝑡𝑒𝑒𝑖𝑛𝑐_𝑑𝑎𝑡𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝐼𝐷:𝑥𝑒𝖳_𝖤𝗇𝗍𝗂𝗍𝗒\displaystyle\underset{\langle name=e.name,\>orig\_name=e.orig\_name,\>inact\_% date=e.inact\_date,\>inc\_date=e.inc\_date,\>\dots,\>source=e.sourceID\rangle}% {\left(x=(e):\mathsf{T\_Entity}\right)}start_UNDERACCENT ⟨ italic_n italic_a italic_m italic_e = italic_e . italic_n italic_a italic_m italic_e , italic_o italic_r italic_i italic_g _ italic_n italic_a italic_m italic_e = italic_e . italic_o italic_r italic_i italic_g _ italic_n italic_a italic_m italic_e , italic_i italic_n italic_a italic_c italic_t _ italic_d italic_a italic_t italic_e = italic_e . italic_i italic_n italic_a italic_c italic_t _ italic_d italic_a italic_t italic_e , italic_i italic_n italic_c _ italic_d italic_a italic_t italic_e = italic_e . italic_i italic_n italic_c _ italic_d italic_a italic_t italic_e , … , italic_s italic_o italic_u italic_r italic_c italic_e = italic_e . italic_s italic_o italic_u italic_r italic_c italic_e italic_I italic_D ⟩ end_UNDERACCENT start_ARG ( italic_x = ( italic_e ) : sansserif_T _ sansserif_Entity ) end_ARG
(R13subscript𝑅13R_{13}italic_R start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT) (o:𝖮𝖿𝖿𝗂𝖼𝖾𝗋)\displaystyle(o:\mathsf{Officer})\implies( italic_o : sansserif_Officer ) ⟹ (x=(o):𝖳_𝖮𝖿𝖿𝗂𝖼𝖾𝗋)name=o.name,status=o.status,source=o.sourceIDdelimited-⟨⟩formulae-sequence𝑛𝑎𝑚𝑒𝑜𝑛𝑎𝑚𝑒𝑠𝑡𝑎𝑡𝑢𝑠𝑜𝑠𝑡𝑎𝑡𝑢𝑠𝑠𝑜𝑢𝑟𝑐𝑒𝑜𝑠𝑜𝑢𝑟𝑐𝑒𝐼𝐷:𝑥𝑜𝖳_𝖮𝖿𝖿𝗂𝖼𝖾𝗋\displaystyle\underset{\langle name=o.name,\>status=o.status,\>source=o.% sourceID\rangle}{\left(x=(o):\mathsf{T\_Officer}\right)}start_UNDERACCENT ⟨ italic_n italic_a italic_m italic_e = italic_o . italic_n italic_a italic_m italic_e , italic_s italic_t italic_a italic_t italic_u italic_s = italic_o . italic_s italic_t italic_a italic_t italic_u italic_s , italic_s italic_o italic_u italic_r italic_c italic_e = italic_o . italic_s italic_o italic_u italic_r italic_c italic_e italic_I italic_D ⟩ end_UNDERACCENT start_ARG ( italic_x = ( italic_o ) : sansserif_T _ sansserif_Officer ) end_ARG
iii Exporting the nodes (R10R13𝑅10𝑅13R10-R13italic_R 10 - italic_R 13).
Figure 18. Rules of the ICIJ database transformation.
(R14subscript𝑅14R_{14}italic_R start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT) (o:𝖮𝖿𝖿𝗂𝖼𝖾𝗋):𝗌𝗂𝗆𝗂𝗅𝖺𝗋  (p:𝖮𝖿𝖿𝗂𝖼𝖾𝗋)\displaystyle(o:\mathsf{Officer})\>\underset{}{\leavevmode\hbox to42.3pt{\vbox to% 19.55pt{\pgfpicture\makeatletter\hbox{\hskip 21.15094pt\lower-8.31221pt\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill% {0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }% \nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{% {\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-17.41794pt}{3.04645pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{similar}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-20.95094pt}{0.0pt}\pgfsys@lineto{20.49095pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{20.% 49095pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>(p:\mathsf{Officer})\implies( italic_o : sansserif_Officer ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_similar end_ARG ( italic_p : sansserif_Officer ) ⟹ (x=(o):𝖳_𝖮𝖿𝖿𝗂𝖼𝖾𝗋)():𝖳_𝖲𝗂𝗆𝗂𝗅𝖺𝗋 link="𝗌𝗂𝗆𝗂𝗅𝖺𝗋𝗇𝖺𝗆𝖾𝖺𝗇𝖽𝖺𝖽𝖽𝗋𝖾𝗌𝗌𝖺𝗌" (y=(p):𝖳_𝖮𝖿𝖿𝗂𝖼𝖾𝗋)\displaystyle(x=(o):\mathsf{T\_Officer})\>\underset{}{\leavevmode\hbox to141.2% pt{\vbox to26.62pt{\pgfpicture\makeatletter\hbox{\hskip 70.59901pt\lower-13.31% 221pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}% }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-26.98184pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\mathit{(% \mathsf{})}\,:\,\mathsf{T\_Similar}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-66.86601pt}{-7.4792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\langle link% =\mathsf{"similar\ name\ and\ address\ as"}\rangle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-70.39902pt}{0.0pt}\pgfsys@lineto{69.93903pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{69.% 93903pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>(y=(p):\mathsf{T\_Officer})( italic_x = ( italic_o ) : sansserif_T _ sansserif_Officer ) start_UNDERACCENT end_UNDERACCENT start_ARG ( ) : sansserif_T _ sansserif_Similar ⟨ italic_l italic_i italic_n italic_k = " sansserif_similar sansserif_name sansserif_and sansserif_address sansserif_as " ⟩ end_ARG ( italic_y = ( italic_p ) : sansserif_T _ sansserif_Officer )
(R16subscript𝑅16R_{16}italic_R start_POSTSUBSCRIPT 16 end_POSTSUBSCRIPT) (a:𝖠𝖽𝖽𝗋𝖾𝗌𝗌):𝗌𝖺𝗆𝖾_𝖺𝗌  (b:𝖠𝖽𝖽𝗋𝖾𝗌𝗌)\displaystyle(a:\mathsf{Address})\>\underset{}{\leavevmode\hbox to50.6pt{\vbox to% 21.62pt{\pgfpicture\makeatletter\hbox{\hskip 25.29956pt\lower-8.31221pt\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill% {0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }% \nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{% {\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-21.56656pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{same\_as}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-25.09956pt}{0.0pt}\pgfsys@lineto{24.63957pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{24.% 63957pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>(b:\mathsf{Address})\implies( italic_a : sansserif_Address ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_same _ sansserif_as end_ARG ( italic_b : sansserif_Address ) ⟹ (x=(a):𝖳_𝖠𝖽𝖽𝗋𝖾𝗌𝗌):𝖳_𝖲𝖠𝖬𝖤_𝖠𝖲  (y=(b):𝖳_𝖠𝖽𝖽𝗋𝖾𝗌𝗌)\displaystyle(x=(a):\mathsf{T\_Address})\>\underset{}{\leavevmode\hbox to71.15% pt{\vbox to21.62pt{\pgfpicture\makeatletter\hbox{\hskip 35.57318pt\lower-8.312% 21pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}% }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-31.84018pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{T\_SAME\_AS}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-35.37318pt}{0.0pt}\pgfsys@lineto{34.9132pt}{0.% 0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{34.% 9132pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>(y=(b):\mathsf{T\_Address})( italic_x = ( italic_a ) : sansserif_T _ sansserif_Address ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_T _ sansserif_SAME _ sansserif_AS end_ARG ( italic_y = ( italic_b ) : sansserif_T _ sansserif_Address )
(R17subscript𝑅17R_{17}italic_R start_POSTSUBSCRIPT 17 end_POSTSUBSCRIPT) (o:𝖮𝖿𝖿𝗂𝖼𝖾𝗋):𝗋𝖾𝗀𝗂𝗌𝗍𝖾𝗋𝖾𝖽_𝖺𝖽𝖽𝗋𝖾𝗌𝗌  (a:𝖠𝖽𝖽𝗋𝖾𝗌𝗌)\displaystyle(o:\mathsf{Officer})\>\underset{}{\leavevmode\hbox to81.42pt{% \vbox to21.62pt{\pgfpicture\makeatletter\hbox{\hskip 40.7093pt\lower-8.31221pt% \hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}% }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-36.9763pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{registered\_address}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-40.50931pt}{0.0pt}\pgfsys@lineto{40.04932pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{40.% 04932pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>(a:\mathsf{Address})\implies( italic_o : sansserif_Officer ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_registered _ sansserif_address end_ARG ( italic_a : sansserif_Address ) ⟹ (x=(o):𝖳_𝖮𝖿𝖿𝗂𝖼𝖾𝗋):𝖳_𝖱𝖤𝖦𝖨𝖲𝖳𝖤𝖱𝖤𝖣_𝖠𝖣𝖣𝖱𝖤𝖲𝖲  (y=(a):𝖳_𝖠𝖽𝖽𝗋𝖾𝗌𝗌)\displaystyle(x=(o):\mathsf{T\_Officer})\>\underset{}{\leavevmode\hbox to% 122.24pt{\vbox to21.62pt{\pgfpicture\makeatletter\hbox{\hskip 61.11836pt\lower% -8.31221pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}% }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-57.38536pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{T\_REGISTERED\_ADDRESS}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-60.91837pt}{0.0pt}\pgfsys@lineto{60.45837pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{60.% 45837pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>(y=(a):\mathsf{T\_Address})( italic_x = ( italic_o ) : sansserif_T _ sansserif_Officer ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_T _ sansserif_REGISTERED _ sansserif_ADDRESS end_ARG ( italic_y = ( italic_a ) : sansserif_T _ sansserif_Address )
i Improving similarity detection (R14R17𝑅14𝑅17R14-R17italic_R 14 - italic_R 17).
(e:𝖤𝗇𝗍𝗂𝗍𝗒)jurisdiction_desc=jurisdiction_descdelimited-⟨⟩𝑗𝑢𝑟𝑖𝑠𝑑𝑖𝑐𝑡𝑖𝑜𝑛_𝑑𝑒𝑠𝑐𝑗𝑢𝑟𝑖𝑠𝑑𝑖𝑐𝑡𝑖𝑜𝑛_𝑑𝑒𝑠𝑐:𝑒𝖤𝗇𝗍𝗂𝗍𝗒absent\displaystyle\underset{\langle jurisdiction\_desc=jurisdiction\_desc\rangle}{(% e:\mathsf{Entity})}\impliesstart_UNDERACCENT ⟨ italic_j italic_u italic_r italic_i italic_s italic_d italic_i italic_c italic_t italic_i italic_o italic_n _ italic_d italic_e italic_s italic_c = italic_j italic_u italic_r italic_i italic_s italic_d italic_i italic_c italic_t italic_i italic_o italic_n _ italic_d italic_e italic_s italic_c ⟩ end_UNDERACCENT start_ARG ( italic_e : sansserif_Entity ) end_ARG ⟹ (x=(e):𝖳_𝖤𝗇𝗍𝗂𝗍𝗒):𝖳_𝖨𝖭_𝖩𝖴𝖱𝖨𝖲  (y=(e.jurisdiction_desc):𝖳_𝖩𝗎𝗋𝗂𝗌𝖽𝗂𝖼𝗍𝗂𝗈𝗇)juris=e.jurisdiction_desc,\displaystyle(x=(e):\mathsf{T\_Entity})\>\underset{}{\leavevmode\hbox to69.88% pt{\vbox to21.62pt{\pgfpicture\makeatletter\hbox{\hskip 34.94122pt\lower-8.312% 21pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}% \pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}% }{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-31.20822pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{T\_IN\_JURIS}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-34.74123pt}{0.0pt}\pgfsys@lineto{34.28123pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{34.% 28123pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\underset{\langle juris=e.jurisdiction\_% desc\rangle}{(y=(e.jurisdiction\_desc):\mathsf{T\_Jurisdiction})},( italic_x = ( italic_e ) : sansserif_T _ sansserif_Entity ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_T _ sansserif_IN _ sansserif_JURIS end_ARG start_UNDERACCENT ⟨ italic_j italic_u italic_r italic_i italic_s = italic_e . italic_j italic_u italic_r italic_i italic_s italic_d italic_i italic_c italic_t italic_i italic_o italic_n _ italic_d italic_e italic_s italic_c ⟩ end_UNDERACCENT start_ARG ( italic_y = ( italic_e . italic_j italic_u italic_r italic_i italic_s italic_d italic_i italic_c italic_t italic_i italic_o italic_n _ italic_d italic_e italic_s italic_c ) : sansserif_T _ sansserif_Jurisdiction ) end_ARG ,
(R18subscript𝑅18R_{18}italic_R start_POSTSUBSCRIPT 18 end_POSTSUBSCRIPT) (y):𝖳_𝖱𝖤𝖫𝖠𝖳𝖤𝖣  (z=(e.jurisdiction):𝖳_𝖢𝗈𝗎𝗇𝗍𝗋𝗒)\displaystyle\left(y\right)\>\underset{}{\leavevmode\hbox to68.31pt{\vbox to% 21.62pt{\pgfpicture\makeatletter\hbox{\hskip 34.15651pt\lower-8.31221pt\hbox to% 0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{% 0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill% {0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }% \nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}}\hbox{\hbox{% {\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-30.42351pt}{2.97699pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor% }{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle\,:\,% \mathsf{T\_RELATED}$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.66666pt}{-4.9792pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}% {rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$\scriptstyle$ }} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{ {}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{}}{}{}% } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}{}{}}{}{{}{}{}}{}{{}{}}{{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }% {{}{{}}{}{}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}{}{{}}{}{{}} {}{}{}{}{}{}{{}}\pgfsys@moveto{-33.95651pt}{0.0pt}\pgfsys@lineto{33.49652pt}{0% .0pt}\pgfsys@stroke\pgfsys@invoke{ }{{}{{}}{}{}{{}}{{{}}{{{}}{% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{33.% 49652pt}{0.0pt}\pgfsys@invoke{ }\pgfsys@invoke{ \lxSVG@closescope }% \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}{{}}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}\>\left(z=(e.jurisdiction):\mathsf{T\_% Country}\right)( italic_y ) start_UNDERACCENT end_UNDERACCENT start_ARG : sansserif_T _ sansserif_RELATED end_ARG ( italic_z = ( italic_e . italic_j italic_u italic_r italic_i italic_s italic_d italic_i italic_c italic_t italic_i italic_o italic_n ) : sansserif_T _ sansserif_Country )

ii Refactoring jurisdictions (R18𝑅18R18italic_R 18).
Figure 19. Rules of the ICIJ database transformation. (Bis)

In this section, we want to compare the cost of running the whole transformation compared to the cost of querying the source property graph to extract the bindings (intermediate data). To this end, we use a real-world dataset, the Offshore Leaks Database and guide from the International Consortium of Investigative Journalists (ICIJ) (ICIJ-github), a property graph with 1,908,466 nodes and 3,193,390 edges taken from (10.14778/3611479.3611506). This dataset consolidates data from several leaks (Panama Papers, Bahamas Leaks, etc.) collected by ICIJ over a period of ten years, but still presents the consolidated data in a heterogeneous manner. The dataset contains information about entities (off-shore companies), officers of those, intermediaries (middlemen who help set up off-shore companies), and jurisdictions (countries or territories where off-shore companies are registered). We have designed a modular 18181818-rule transformation aiming to uniformize the presentation of the information contained in the graph. The rules are grouped into 5 subsets, each addressing a specific refactoring goal motivated below. For space reasons, we have deferred the rules themselves to the appendix (TPG-github).

Refactoring registered addresses (R1R4𝑅1𝑅4R1-R4italic_R 1 - italic_R 4). The ICIJ database contains the registered addresses of the officers, and entities. These rules are responsible for creating nodes representing countries and linking to the addresses. Because the data is semi-structured and collected from multiple sources, information may be stored in attributes that have different names, or may even not be available at all. All these cases are covered by these four rules.

Uniformizing address information for intermediaries (R5R9𝑅5𝑅9R5-R9italic_R 5 - italic_R 9). After careful investigation, we found that the registered address of intermediaries can be stored in three different ways in the database: (i) an intermediary can have a direct relationship with an address, (ii) the address can be stored in the properties of the node itself, and (iii) when neither of the two previous cases applies, it is necessary to retrieve the address of an entity linked to this intermediary. These rules permit to consistently store address information.

Exporting the nodes (R10R13𝑅10𝑅13R10-R13italic_R 10 - italic_R 13). These rules copy the node information from the source to the target; they are necessary to preserve all the information from the original graph.

Improving similarity detection (R14R17𝑅14𝑅17R14-R17italic_R 14 - italic_R 17). Because the dataset consolidates multiple leaks, certain specific relationships, such as similar and same_as, are used to indicate that some officers (resp. addresses) are likely to represent the same real life entity. These rules focus on exporting this data and improving the similarity detection. This is illustrated by Rule 15151515 shown in Figure 17 which composes the relationships similar and same_as to ensure that both its endpoints correspond to officers having the same address. (This is because similar encompasses address similarity.) Then, it checks whether their names are also similar. If both conditions hold, it safely adds a similarity edge between the endpoints in the output.

Refactoring jurisdictions (R18𝑅18R18italic_R 18). The last rule is responsible for connecting the jurisdictions with their associated countries; this information is not explicitly stored in the initial database.

Results. Our experimental results are reported in Table 2. We report the time tintsubscript𝑡𝑖𝑛𝑡t_{int}italic_t start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT (in ms) the database takes to retrieve the intermediate data; the total time t𝑡titalic_t of running the transformation (extracting the bindings and constructing the output); the size Int(G,T)𝐼𝑛𝑡𝐺𝑇Int(G,T)italic_I italic_n italic_t ( italic_G , italic_T ) of intermediate data; the size of the output T(G)𝑇𝐺T(G)italic_T ( italic_G ); the ratio O/I𝑂𝐼O/Iitalic_O / italic_I of the size of the output to the size of intermediate data. To account for the differences in the sizes of the outputs of the respective tasks, we also report the average time tintesuperscriptsubscript𝑡𝑖𝑛𝑡𝑒t_{int}^{e}italic_t start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT taken to produce each binding of the intermediate result, and the average time tesuperscript𝑡𝑒t^{e}italic_t start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT taken to construct each element of the output. We break down the reported values into groups of rules corresponding to the aforementioned integration tasks.

There are several things that we can learn from Table 2. First, the overhead te/tintesuperscript𝑡𝑒superscriptsubscript𝑡𝑖𝑛𝑡𝑒t^{e}/t_{int}^{e}italic_t start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT / italic_t start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT of turning the intermediate results into a proper property graph is reasonable. For Rules R14R17𝑅14𝑅17R14-R17italic_R 14 - italic_R 17, it is even comparatively more efficient to compute the output property graph (overhead is 0.90.90.90.9). The worst case is for rules R10R13𝑅10𝑅13R10-R13italic_R 10 - italic_R 13 exhibiting an overhead of 2.42.42.42.4. Second, the ratio O/I𝑂𝐼O/Iitalic_O / italic_I is also reasonable, ranging from 1111 to 2.42.42.42.4. This shows that, in practical contexts, Int(G,T)𝐼𝑛𝑡𝐺𝑇Int(G,T)italic_I italic_n italic_t ( italic_G , italic_T ) can be assumed to have a size comparable to |T(G)|𝑇𝐺|T(G)|| italic_T ( italic_G ) |.

Thus, we have demonstrated that the overhead incurred by producing a property graph rather than a set of bindings is acceptable for a realistic transformation in a real-life integration scenario.

11. Related Work

Schema mapping and data exchange. Specifying the relationship between two relational (or XML) schemas using a set of declarative assertions is a task known as schema mapping (10.1145/1065167.1065176; fagin_data_2005; bellahsene2011schema). This relation, is usually non functional, i.e. given an input instance I𝐼Iitalic_I, several target instances satisfying the mapping constraints exist.

Schema mappings and data exchange have been studied in (10.1145/2448496.2448520; 10.1145/3034786.3056113) for graph databases. The mapping languages considered are based on classical graph database queries such as regular path queries (Barcel2012RelativeEO), limited in their expressivity by not supporting data values. Moreover, answering queries on the target is already intractable in data complexity for RPQs (10.1145/2448496.2448520) and undecidable for data RPQs (10.1145/3034786.3056113). In comparison, our transformation framework provides more flexibility by including the support for data values, and any target query can be answered by simple execution on the produced property graph.

Graph transformations. Graph database transformations defined using Datalog-like rules based on acyclic conjunctive two-way regular path queries have been investigated in (10.1145/3584372.3588654). They study three fundamental static analysis problems: type checking, equivalence of transformations under graph schemas, and schema elicitation. They show all these problems to be in ExpTime.

A key difference with our work lies in the graph database model they consider, which does not have data values. We have seen that dealing with data values gives rise to the consistency checking problem, which is key to understanding if a property graph transformation is well-defined. Moreover, their query language – a fragment of Datalog, is not practical for querying property graphs (7ad59132cb3c45e2851f565fbb703cea). Another difference is that they are using a single dedicated node constructor for each label. In Section 4 and 8, we have seen that this approach is too rigid for dealing with multiple labels.

Object-creating functions. The Skolem functions we use in our constructors resemble to the object creating functions that are used in the object-oriented database model (10.1145/290179.290182; 10.5555/645916.671975). Among transformation languages based on oid generation, StruQL (10.1145/262762.262763) specifically operates on object-oriented semi-structured instances where nodes can either be data values or contain an oid and labeled edges can connect oid nodes to oid or value nodes. The major difference with our work is that they have multi-valued attributes: i.e., an oid node may be connected via a𝑎aitalic_a-edges to several value nodes. Hence, additional integrity constraints are necessary to ensure a correct modeling of property graphs in their model. Therefore, they did not take into account the problem of consistency.

Interoperability of graph data. Although RDF, RDF-star and the property graph data model share striking similarities, both being based on elementary graph concepts, like nodes and edges, intricate interoperability issues arise when attempting to exchange data between them. RDF-star notably allows for annotating RDF triples with metadata annotations, which are notoriously difficult to capture within the property graph data model as witnessed in (abuoda_transforming_2022).

The main concern of transformation languages between graph data models is thus primarily focused on solving the well-known impedance mismatch problem (bernstein_model_2007), which does not arise in our setting because we have property graphs for both input and output. Our transformation language can be thus more expressive, and can be executed by the graph database management system itself.

Mining the identities of nodes across networks. Network alignment is a technique for finding node correspondences between two or more networks. It can be used, for example, to associate nodes from different social networks with the same user (10.1145/3340531.3412168). Nodes are identified based on their similarities with respect to both their features (i.e., their properties) and their neighborhood.

While these methods are not part of graph transformation formalisms, they can be used to guide the construction of graph transformations. For instance, in Section 10.1, the results of network alignment (the similarity edges in the Offshore Leaks Database) were leveraged to better integrate data coming from multiple leaks.

12. Conclusion

Our research is the first to lay the theoretical foundations for declarative property graph transformations, and facilitate practical solutions for turning such specifications into executable scripts in modern property graph query languages. New challenges arise from the specification of property-aware transformations, notably the task of checking if a transformation is consistent. Using a proof-of-concept implementation of our formalism in openCypher, we showcase the efficiency of our approach for transforming property graphs for both real-world and synthetic datasets.

This work paves the way for obtaining compositional semantics for graph query languages. As a future direction, we will investigate the model extensions needed for the above semantics, by addressing label and path variables, and aggregates. Meanwhile, our framework can already seamlessly support the group variables of GPC because those are list of identifiers that can be flattened into the identifier lists of the constructors. Finally, we will investigate how to assist users in the design process of their transformation rules; for instance by lifting schema matching techniques (bernstein_model_2007; bellahsene2011schema) from relational to property graph schemas.

References

  • (1)
  • Abiteboul and Kanellakis (1998) Serge Abiteboul and Paris C. Kanellakis. 1998. Object Identity as a Query Language Primitive. J. ACM 45, 5 (1998), 798–842.
  • Abuoda et al. (2022) Ghadeer Abuoda, Daniele Dell’Aglio, Arthur Keen, and Katja Hose. 2022. Transforming RDF-star to Property Graphs: A Preliminary Analysis of Transformation Approaches. In QuWeDa 2022. 17–32.
  • Arenas et al. (2010) Marcelo Arenas, Pablo Barcelo, Leonid Libkin, and Filip Murlak. 2010. Relational and XML Data Exchange (1st ed.). Morgan and Claypool Publishers.
  • Arocena et al. (2015) Patricia C. Arocena, Boris Glavic, Radu Ciucanu, and Renée J. Miller. 2015. The IBench Integration Metadata Generator. VLDB 9, 3 (2015), 108–119.
  • Arocena et al. (2013) Patricia C. Arocena, Boris Glavic, and Renee J. Miller. 2013. Value Invention in Data Exchange. In SIGMOD. 157–168.
  • Barceló et al. (2013) Pablo Barceló, Jorge Pérez, and Juan Reutter. 2013. Schema Mappings and Data Exchange for Graph Databases. In ICDT. 189–200.
  • Barceló et al. (2012) Pablo Barceló, Jorge Pérez, and Juan L. Reutter. 2012. Relative Expressiveness of Nested Regular Expressions. In AMW. 180–195.
  • Barceló Baeza (2013) Pablo Barceló Baeza. 2013. Querying Graph Databases. In PODS. 175–188.
  • Bellahsene et al. (2011) Z. Bellahsene, A. Bonifati, and E. Rahm. 2011. Schema Matching and Mapping.
  • Bernstein and Melnik (2007) Philip A. Bernstein and Sergey Melnik. 2007. Model Management 2.0: Manipulating Richer Mappings. In SIGMOD. 1–12.
  • Boneva et al. (2023) Iovka Boneva, Benoît Groz, Jan Hidders, Filip Murlak, and Slawek Staworko. 2023. Static Analysis of Graph Database Transformations. In PODS. 251–261.
  • Bonifati et al. (2017) Angela Bonifati, Ugo Comignani, Emmanuel Coquery, and Romuald Thion. 2017. Interactive Mapping Specification with Exemplar Tuples. In SIGMOD. 667–682.
  • Bonifati et al. (2018) Angela Bonifati, G.H.L. Fletcher, Hannes Voigt, and N. Yakovets. 2018. Querying graphs. Morgan & Claypool Publishers.
  • Bonifati et al. (2024) Angela Bonifati, Filip Murlak, and Yann Ramusat. 2024. Transforming Property Graphs (Appendix). https://github.com/yannramusat/TPG/blob/main/Appendix.pdf
  • Chiticariu and Tan (2006) Laura Chiticariu and Wang-Chiew Tan. 2006. Debugging schema mappings with routes. In PVLDB. 79–90.
  • Deutsch et al. (2022) Alin Deutsch, Nadime Francis, Alastair Green, Keith Hare, Bei Li, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Wim Martens, Jan Michels, Filip Murlak, Stefan Plantikow, Petra Selmer, Oskar van Rest, Hannes Voigt, Domagoj Vrgoc, Mingxi Wu, and Fred Zemke. 2022. Graph Pattern Matching in GQL and SQL/PGQ. In SIGMOD. 2246–2258.
  • Fagin et al. (2005b) Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, and Lucian Popa. 2005b. Data Exchange: Semantics and Query Answering. TCS 336, 1 (2005), 89–124.
  • Fagin et al. (2005a) Ronald Fagin, Phokion G. Kolaitis, and Lucian Popa. 2005a. Data Exchange: Getting to the Core. TODS 30, 1 (2005), 174–210.
  • Fernandez et al. (1997) Mary Fernandez, Daniela Florescu, Alon Levy, and Dan Suciu. 1997. A Query Language for a Web-Site Management System. SIGMOD 26, 3 (1997), 4–11.
  • Fiandor and Hunger ([n.d.]) Miguel Fiandor and Michael Hunger. [n.d.]. Offshoreleaks Data Packages. Retrieved March 1, 2024 from https://github.com/ICIJ/offshoreleaks-data-packages
  • Francis et al. (2023a) Nadime Francis, Amélie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, Alexandra Rogova, and Domagoj Vrgoc. 2023a. GPC: A Pattern Calculus for Property Graphs. In PODS. 241–250.
  • Francis et al. (2023b) Nadime Francis, Amélie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, Alexandra Rogova, and Domagoj Vrgoč. 2023b. A Researcher’s Digest of GQL. In ICDT, Vol. 255. 1–22.
  • Francis et al. (2018) Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. 2018. Cypher: An Evolving Query Language for Property Graphs. In SIGMOD. 1433–1445.
  • Francis and Libkin (2017) Nadime Francis and Leonid Libkin. 2017. Schema Mappings for Data Graphs. In PODS’17. 389–401.
  • Garey and Johnson (1990) Michael R. Garey and David S. Johnson. 1990. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co.
  • Green et al. (2019) Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Martin Schuster, Petra Selmer, and Hannes Voigt. 2019. Updating graph databases with Cypher. VLDB 12, 12 (2019), 2242–2254.
  • Hull and Yoshikawa (1990) Richard Hull and Masatoshi Yoshikawa. 1990. ILOG: Declarative Creation and Manipulation of Object Identifiers. In VLDB. 455–468.
  • Kolaitis (2005) Phokion G. Kolaitis. 2005. Schema Mappings, Data Exchange, and Metadata Management. In PODS. 61–75.
  • Neo4j (2023a) Neo4j. 2023a. APOC user guide for Neo4j 5. Retrieved November 9, 2023 from https://neo4j.com/docs/apoc/current/
  • Neo4j (2023b) Neo4j. 2023b. Graph Data Modeling Fundamentals. Retrieved November 9, 2023 from https://graphacademy.neo4j.com/courses/modeling-fundamentals/
  • Skavantzos and Link (2023) Philipp Skavantzos and Sebastian Link. 2023. Normalizing Property Graphs. Proc. VLDB Endow. 16, 11 (2023), 3031–3043.
  • van Rest et al. (2016) Oskar van Rest, Sungpack Hong, Jinha Kim, Xuming Meng, and Hassan Chafi. 2016. PGQL: A Property Graph Query Language. In GRADES. 1–6.
  • Vardi (2016) Moshe Y. Vardi. 2016. A Theory of Regular Queries. In PODS. 1–9.
  • Zhang and Tong (2020) Si Zhang and Hanghang Tong. 2020. Network Alignment: Recent Advances and Future Directions. In CIKM. 3521–3522.