Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Logical reduction of relations: from relational databases to Peirce’s reduction thesis

Sergiy Koshkin

Department of Mathematics and Statistics
University of Houston-Downtown
One Main Street
Houston, TX 77002
e-mail: koshkins@uhd.edu
Abstract

We study logical reduction (factorization) of relations into relations of lower arity by Boolean or relative products that come from applying conjunctions and existential quantifiers to predicates, i.e. by primitive positive formulas of predicate calculus. Our algebraic framework unifies natural joins and data dependencies of database theory and relational algebra of clone theory with the bond algebra of C.S. Peirce. We also offer new constructions of reductions, systematically study irreducible relations and reductions to them, and introduce a new characteristic of relations, ternarity, that measures their ‘complexity of relating’ and allows to refine reduction results. In particular, we refine Peirce’s controversial reduction thesis, and show that reducibility behavior is dramatically different on finite and infinite domains.


Keywords: relational algebra, relation scheme, attribute, Cartesian product, Boolean product, natural join, primitive positive formula, constraint satisfaction problem, co-clone, project-join expression, relative product, irreducible relation, teridentity, bonding algebra, Peirce’s reduction thesis, subcubic graph

Introduction

We study decomposition of relations into simpler relations by logical operations expressible in terms of conjunctions and existential quantifiers on predicates, i.e. by primitive positive formulas of predicate calculus [22]. Such analysis goes back to the work of C.S. Peirce at the dawn of algebraic logic, see [7, 9, 21] for modern accounts. However, while some aspects of it have been developed by Schröder, Löwenheim, Tarski, and others [10], they gradually shifted the focus to predicates in formal theories and questions of axiomatization. On the other hand, decomposition of relations in algebras of relational clones or co-clones is closely related to Peirce’s, but was developed independently of his work, from Post’s study of the dual decomposition of functions on finite sets, see [23, 33]. Peirce’s results were largely forgotten until modern formalizations of his bond algebra in [8, 17].

The original interest was linguistic and philosophical. A different stimulus came from the relational database model introduced by Codd [12], and developed by Fagin [16], Rissanen [34], and others. It featured weaker than Peirce’s, but closely related, notion of decomposition. Unfortunately, the two streams of literature remained largely disjoint. This paper is, in part, a unified survey of classical but little known (to logicians and mathematicians) results and their interconnections, and, in part, their extension by the author.

From the mathematical perspective, join decomposition of relations is somewhat analogous to factorization of polynomials. There is even a ready analog of polynomial’s degree that measures its ‘multiplicative complexity’ – relation’s arity (adicity, rank), the number of its attributes (places, positions). There is an even stronger analogy to the recent theory of factorization in highly non-cancellative monoids with degree-like height function [37]. Accordingly, we will call decomposition terms factors, and call decompositions reductions when all the factors have strictly lower arity than the relation itself.

A technical device of database theory that we will extensively use in this paper is the calculus of attributed relations (the term is due to [38], see also named perspective in [1, 3.2]). Those are collections of maps from a set of attributes, called the relation scheme, to the domain, rather than ordered lists of domain elements, as in ordinary set-theoretic relations. This allows us to identify positions in different relations through relation schemes, the same flexibility one gets by placing the same variable into different predicates in logical formulas, and removes much of combinatorial clutter that plagues definitions and calculus of operations on ordinary relations (compare to [8, 14]). Moreover, operations on attributed relations have better algebraic properties.

For example, the Cartesian product is commutative and associative on attributed relations, albeit only partially defined: relations that share attributes cannot be multiplied. What we get by extending it to all attributed relations is classically known as Boolean product, and it is still commutative and associative. In logical terms, it amounts to taking conjunctions of predicates with some variables identified instead of only free conjunctions. In predicate calculus, such expressions are called quantifier-free primitive positive formulas [22]. Its iteration, called the natural join, is the primary operation of the database theory. The reason is that join decompositions are closely associated with various data dependencies, and can be used to store, query and update the data more efficiently [2, 3, 16, 24, 28, 34].

We start by reviewing some known join reduction methods that rely on exploiting various dependencies among relation’s tuples. The simplest one is functional dependency, when there is an attribute, called key, whose value determines the rest of its tuple (ID columns play this role in database tables). However, despite the abundance of reduction methods, it turns out that join irreducible relations also abound. Not only are there ones of arbitrarily high arity, but also ‘almost all’ relations are join irreducible on large domains (Theorem 8). Although this fact must have been known to experts, the author did not encounter it stated in print.

Thus, join reduction is analogous to factorization of polynomials over the field of rationals, with irreducibles of arbitrarily high degrees. This prompts one to look for extensions with a more manageable set of irreducibles. However, in line with the literature on the subject, rather than enlarging the class of relations we opt for stronger algebra on the same class. The stronger operation is projective join (projoin for short) that combines join with projections. In logical terms, we allow to existentially quantify (‘project out’) some variables in predicate conjunctions, i.e. consider all primitive positive predicate formulas.

While some project-join algebras have been studied in the more theoretical database literature [15, 38], they appeared much more prominently in the theory of constraint satisfaction problems (CSP) that also dates back to 1970s. The constraints are expressed by relations on finite domains, and the problems are to decide whether relations from a given set can simultaneously hold on some elements of the domain (be satisfied). Computational complexity of CSP was actively studied in computer science, and in 1978 Schaefer discovered a deep connection between it and closure properties of sets of relations under projective joins on 2222-element domains [35]. The work initiated by Feder-Vardi and Jeavons in 1990s extended this connection to all finite domains, giving rise to what is now called the algebraic approach to complexity of CSP [6]. In particular, it turned out that if constraints are projoin complete, i.e. generate all relations, then the CSP is NP-complete (assuming P\neqNP). In general, projoin closures of relations are called relational clones or co-clones, and Schaefer characterized NP completeness in terms of them on 2222-element domains. They are dual to the functional clones of Post and studied since 1960s [23, 33]. Join closures, called weak partial co-clones, also found applications to studying complexity of CSP, namely to refined complexity classification of NP-complete problems [20].

The projoin reduction problem on finite domains can be reformulated as asking whether the set of all unary and binary relations is projoin complete, i.e. whether they generate the co-clone of all relations. In hindsight, the affirmative answer to the same question on infinite domains goes back to Peirce. The proof device, introduced by Peirce under the quaint name of ‘hypostatic abstraction’, amounts, in database terms, to attaching key attribute(s) to the relation, and then projecting them out after join reducing the augmented relation (Theorem 9). Reducibility of all relations to unary and binary ones is more analogous to factorization of polynomials over the field of real numbers, where they reduce to linear and quadratic factors.

Counterintuitively, the situation is much more complex on finite domains, and we introduce some further devices that allow to projoin reduce some relations when hypostatic abstraction does not (Section 5). They exploit the connection between projections and unions (existential quantification and disjunction). In particular, we generalize to projoins Fagin’s characterization of certain joins in terms of multivalued dependencies (Theorem 11), and show how to convert complements (negations) of certain joins into projoins. However, we do not resort to a central device of clone theory, the Pol-Inv Galois connection [6, 23], as our approach is to develop the more elementary Peircean methods that construct reductions explicitly.

While projection and join can be conveniently folded into a single projoin operation, this operation is not an iteration of any binary operation, like Cartesian product and join were. However, one can iterate a particular binary projoin classically known as relative product [7] (composition of binary relations is its restriction to them). Relative product is associative only on a restricted class of factors, and generates only an (ostensibly) narrow subclass of projoins, which we call bonds following [17].

The bond algebra is, more or less, Peirce’s project-join algebra. In a surprise, he established that bond reducibility is almost equivalent to general projoin reducibility. All relations of arity 4444 and higher are projoin reducible if and only if they are bond reducible, and it is only on ternaries (ternary relations) that the two notions diverge (Theorem 13). There are some projoin reducible but bond irreducible ternaries, notably the teridentity relation I3subscript𝐼3I_{3}italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT that contains all and only identical triples of domain elements and plays a key role in converting projoins into bonds. Bond reducibility of all relations on infinite domains to unaries, binaries and ternaries is known as Peirce’s reduction thesis (see Theorem 19 for a precise formulation), and it is equivalent to the better known projoin reducibility to binaries alone, see e.g. [27]. The seeming discrepancy fueled a long historical controversy [9, 21].

Motivated by the thesis, we introduce the notion of ternarity of a relation as the minimal number of ternaries in its bond reductions to unaries, binaries and ternaries, and study its properties. The reason for singling out ternaries is that they do the main work of ‘relating’ different attributes in a relation as represented by reductions. We establish a close connection between complete bond reductions and subcubic graphs, and prove that ternarity of non-degenerate n𝑛nitalic_n-ary relations is always n2𝑛2n-2italic_n - 2 on infinite domains (Theorem 16), one unit of ternarity per unit of arity over n=2𝑛2n=2italic_n = 2. This refines the original reduction thesis. The proof uses graph-theoretic methods, pioneered by Peirce and prominent in the recent work on reduction [13, 14]. A counterexample then shows that on finite domains this equality can fail already for n=4𝑛4n=4italic_n = 4, and we prove that it fails for ‘almost all’ relations on large finite domains.

The paper is organized as follows. Section 1 introduces our terminology and notation. In Sections 2-3 we review some standard results on Cartesian products and natural joins, stated in terms of attributed relations, and add results on join irreducibility. Sections 4-5 introduce projective joins (projoins) and associated reduction methods, old and new, including the hypostatic abstraction that settles the reduction problem on infinite domains. The irreducibility results are weaker for projoins, and only concern some restricted classes of them. We also explain why reduction behavior is so different in finite and infinite cases. In Section 6 we introduce Peirce’s bonds, and explain his algorithm for converting projoins into them that features teridentity. Peirce’s reduction thesis is then derived as a corollary of the results for projoins. Section 7 defines projoin graphs and bonding diagrams that allow us to apply graph-theoretic methods to bond reductions in Section 8, which studies complete reductions and introduces ternarity. In Section 9 we use ternarity to refine Peirce’s reduction thesis and give some counterexamples. In the last section, we summarize our conclusions and state some open problems.

1 Preliminaries

We use the standard set-theoretic notation and terminology for sets and relations [25]. Relations are defined on a set 𝒟𝒟\mathcal{D}caligraphic_D called the domain and are subsets of its Cartesian powers 𝒟××𝒟𝒟𝒟\mathcal{D}\times\dots\times\mathcal{D}caligraphic_D × ⋯ × caligraphic_D. When R𝒟n𝑅superscript𝒟𝑛R\subseteq\mathcal{D}^{n}italic_R ⊆ caligraphic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT the number n𝑛nitalic_n is called the relation’s arity and the relation is called n𝑛nitalic_n-ary relation or simply n𝑛nitalic_n-ary used as a noun. For n=1,2,3,4𝑛1234n=1,2,3,4italic_n = 1 , 2 , 3 , 4 we use the shorthands unary, binary, ternary, quaternary, respectively.

Elements of 𝒟nsuperscript𝒟𝑛\mathcal{D}^{n}caligraphic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT are called n𝑛nitalic_n-tuples or just tuples, when n𝑛nitalic_n is understood or immaterial. If a𝒟n𝑎superscript𝒟𝑛a\in\mathcal{D}^{n}italic_a ∈ caligraphic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT it’s i𝑖iitalic_i-th member is denoted aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We adopt the usual convention of canonically identifying tuples of tuples with longer tuples, and hence of identifying 𝒟n×𝒟msuperscript𝒟𝑛superscript𝒟𝑚\mathcal{D}^{n}\times\mathcal{D}^{m}caligraphic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × caligraphic_D start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT with 𝒟n+msuperscript𝒟𝑛𝑚\mathcal{D}^{n+m}caligraphic_D start_POSTSUPERSCRIPT italic_n + italic_m end_POSTSUPERSCRIPT, and so on. It is often convenient to interpret tuples as maps from the set of relation’s positions n:={1,2,,n}assignsubscript𝑛12𝑛{\mathbb{N}}_{n}:=\{1,2,\dots,n\}blackboard_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT := { 1 , 2 , … , italic_n } to 𝒟𝒟\mathcal{D}caligraphic_D.

Some standard n𝑛nitalic_n-aries that can be defined on any domain will be called and denoted as follows: the empty relations nsubscript𝑛\emptyset_{n}∅ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT with no tuples; the universal relations Un:=𝒟nassignsubscript𝑈𝑛superscript𝒟𝑛U_{n}:=\mathcal{D}^{n}italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT := caligraphic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT that contain all possible n𝑛nitalic_n-tuples; the identity relations Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT that contain all and only n𝑛nitalic_n-tuples with identical members; and the diversity relations Dnsubscript𝐷𝑛D_{n}italic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT that contain all and only n𝑛nitalic_n-tuples with pairwise distinct members. Each of those can be relativized to proper subsets 𝒜𝒟𝒜𝒟\mathcal{A}\subset\mathcal{D}caligraphic_A ⊂ caligraphic_D that we will indicate by the upper index, e.g. In𝒜superscriptsubscript𝐼𝑛𝒜I_{n}^{\mathcal{A}}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_A end_POSTSUPERSCRIPT contains all and only n𝑛nitalic_n-tuples with identical members from 𝒜𝒜\mathcal{A}caligraphic_A.

The set of all maps from S𝑆Sitalic_S to 𝒟𝒟\mathcal{D}caligraphic_D is denoted 𝒟Ssuperscript𝒟𝑆\mathcal{D}^{S}caligraphic_D start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT​, the set of all subsets of S𝑆Sitalic_S is denoted 𝒫(S)𝒫𝑆\mathcal{P}(S)caligraphic_P ( italic_S ), and its cardinality is denoted |S|𝑆|S|| italic_S |. As is well known, |𝒟n|=|𝒟|nsuperscript𝒟𝑛superscript𝒟𝑛|\mathcal{D}^{n}|=|\mathcal{D}|^{n}| caligraphic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | = | caligraphic_D | start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, |𝒟S|=|𝒟||S|superscript𝒟𝑆superscript𝒟𝑆|\mathcal{D}^{S}|=|\mathcal{D}|^{|S|}| caligraphic_D start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT | = | caligraphic_D | start_POSTSUPERSCRIPT | italic_S | end_POSTSUPERSCRIPT, and |𝒫(S)|=2|S|𝒫𝑆superscript2𝑆|\mathcal{P}(S)|=2^{|S|}| caligraphic_P ( italic_S ) | = 2 start_POSTSUPERSCRIPT | italic_S | end_POSTSUPERSCRIPT.

Attributed relation R𝑅Ritalic_R is a subset R𝒟Σ𝑅superscript𝒟ΣR\subseteq\mathcal{D}^{\Sigma}italic_R ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Σ end_POSTSUPERSCRIPT [13, 15, 38], where ΣΣ\Sigmaroman_Σ is a set called the relation scheme, and its elements are called attributes. The arity of R𝑅Ritalic_R is defined to be |Σ|Σ|\Sigma|| roman_Σ |, and its elements, which are now functions from ΣΣ\Sigmaroman_Σ to 𝒟𝒟\mathcal{D}caligraphic_D, are also called tuples. We only consider finitary relations, so ΣΣ\Sigmaroman_Σ is always finite, the ordinary relations correspond to Σ=nΣsubscript𝑛\Sigma={\mathbb{N}}_{n}roman_Σ = blackboard_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. When ΣΣ\Sigmaroman_Σ is linearly ordered, e.g. ΣΣ\Sigma\subset{\mathbb{N}}roman_Σ ⊂ blackboard_N, there is a canonical 1111-1111 correspondence between ΣΣ\Sigmaroman_Σ and |Σ|subscriptΣ{\mathbb{N}}_{|\Sigma|}blackboard_N start_POSTSUBSCRIPT | roman_Σ | end_POSTSUBSCRIPT obtained by listing elements of ΣΣ\Sigmaroman_Σ in order, which induces a canonical 1111-1111 correspondence between relations on the scheme ΣΣ\Sigmaroman_Σ and ordinary relations.

When ΛΣΛΣ\Lambda\subseteq\Sigmaroman_Λ ⊆ roman_Σ we will denote aΛ𝒟Λsubscript𝑎Λsuperscript𝒟Λa_{\Lambda}\in\mathcal{D}^{\Lambda}italic_a start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT ∈ caligraphic_D start_POSTSUPERSCRIPT roman_Λ end_POSTSUPERSCRIPT the attributed subtuple of a𝑎aitalic_a consisting of its members in the ΛΛ\Lambdaroman_Λ positions. When Λ,MΣΛMΣ\Lambda,\textup{M}\subseteq\Sigmaroman_Λ , M ⊆ roman_Σ, and α𝒟Λ𝛼superscript𝒟Λ\alpha\in\mathcal{D}^{\Lambda}italic_α ∈ caligraphic_D start_POSTSUPERSCRIPT roman_Λ end_POSTSUPERSCRIPT, β𝒟M𝛽superscript𝒟M\beta\in\mathcal{D}^{\textup{M}}italic_β ∈ caligraphic_D start_POSTSUPERSCRIPT M end_POSTSUPERSCRIPT with αΛM=βΛMsubscript𝛼ΛMsubscript𝛽ΛM\alpha_{\Lambda\cap\textup{M}}=\beta_{\Lambda\cap\textup{M}}italic_α start_POSTSUBSCRIPT roman_Λ ∩ M end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT roman_Λ ∩ M end_POSTSUBSCRIPT, we define their concatenation αβ𝒟ΛMconditional𝛼𝛽superscript𝒟ΛM\alpha\mid\beta\in\mathcal{D}^{\Lambda\cup\textup{M}}italic_α ∣ italic_β ∈ caligraphic_D start_POSTSUPERSCRIPT roman_Λ ∪ M end_POSTSUPERSCRIPT as the union of ordered pairs when they are taken as functions from ΣΣ\Sigmaroman_Σ to 𝒟𝒟\mathcal{D}caligraphic_D. The intersection condition is needed for the union to also be a function, and the concatenation is always defined when Λ,MΛM\Lambda,\textup{M}roman_Λ , M are disjoint. When ΣΣ\Sigma\subset{\mathbb{N}}roman_Σ ⊂ blackboard_N the members of the concatenated tuple are listed according to the order of their attributes in {\mathbb{N}}blackboard_N. For example, ((α1,α3)(β2,β3,β5))=(α1,β2,α3,β5)conditionalsubscript𝛼1subscript𝛼3subscript𝛽2subscript𝛽3subscript𝛽5subscript𝛼1subscript𝛽2subscript𝛼3subscript𝛽5((\alpha_{1},\alpha_{3})\mid(\beta_{2},\beta_{3},\beta_{5}))=(\alpha_{1},\beta% _{2},\alpha_{3},\beta_{5})( ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∣ ( italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT ) ) = ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT ) assuming α3=β3subscript𝛼3subscript𝛽3\alpha_{3}=\beta_{3}italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT.

In the database model relations are visualized as rectangular tables with columns labeled by positions (attributes) and rows listing the tuple members. Database queries are then interpreted as operations on relations that extract relevant information and package it into simpler relations. Two operations inspired by this interpretation will be useful to us. The first is projection to a subset of positions ΛΣΛΣ\Lambda\subseteq\Sigmaroman_Λ ⊆ roman_Σ:

πΛR:={aΛ|aR},assignsubscript𝜋Λ𝑅conditional-setsubscript𝑎Λ𝑎𝑅\pi_{\Lambda}R:=\{a_{\Lambda}\,\Big{|}\,a\in R\},italic_π start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT italic_R := { italic_a start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT | italic_a ∈ italic_R } ,

that simply deletes all non-ΛΛ\Lambdaroman_Λ columns and removes duplicate tuples, if any, in the remaining ΛΛ\Lambdaroman_Λ columns. When no confusion results, we write simply πi1,,ikRsubscript𝜋subscript𝑖1subscript𝑖𝑘𝑅\pi_{i_{1},\dots,i_{k}}Ritalic_π start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R instead of π{i1,,ik}Rsubscript𝜋subscript𝑖1subscript𝑖𝑘𝑅\pi_{\{i_{1},\dots,i_{k}\}}Ritalic_π start_POSTSUBSCRIPT { italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT italic_R. The second operation is selection over a subset:

σxΛ=αR:={aΛc|aR,aΛ=α},assignsubscript𝜎subscript𝑥Λ𝛼𝑅conditional-setsubscript𝑎superscriptΛ𝑐formulae-sequence𝑎𝑅subscript𝑎Λ𝛼\sigma_{x_{\Lambda}=\alpha}R:=\{a_{\Lambda^{c}}\,\Big{|}\,a\in R,a_{\Lambda}=% \alpha\},italic_σ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT = italic_α end_POSTSUBSCRIPT italic_R := { italic_a start_POSTSUBSCRIPT roman_Λ start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_a ∈ italic_R , italic_a start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT = italic_α } , (1)

that leaves only rows with prescribed values in the ΛΛ\Lambdaroman_Λ columns (given by α𝒟Λ𝛼superscript𝒟Λ\alpha\in\mathcal{D}^{\Lambda}italic_α ∈ caligraphic_D start_POSTSUPERSCRIPT roman_Λ end_POSTSUPERSCRIPT), and then deletes those columns. Here Λc:=Σ\ΛassignsuperscriptΛ𝑐\ΣΛ\Lambda^{c}:=\Sigma\backslash\Lambdaroman_Λ start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT := roman_Σ \ roman_Λ is the complement of ΛΛ\Lambdaroman_Λ. Note that both πΛRsubscript𝜋Λ𝑅\pi_{\Lambda}Ritalic_π start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT italic_R and σxΛ=αRsubscript𝜎subscript𝑥Λ𝛼𝑅\sigma_{x_{\Lambda}=\alpha}Ritalic_σ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT = italic_α end_POSTSUBSCRIPT italic_R are attributed relations on the schemes Λ,ΛcΛsuperscriptΛ𝑐\Lambda,\Lambda^{c}roman_Λ , roman_Λ start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT, respectively.

Unless otherwise stated, all predicates will be interpreted on a domain, and relations will be identified with the predicates they interpret. In particular, the same letter will be used for a relation and its predicate, i.e. R(a1,,an)𝑅subscript𝑎1subscript𝑎𝑛R(a_{1},\dots,a_{n})italic_R ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) will mean the same as (a1,,an)Rsubscript𝑎1subscript𝑎𝑛𝑅(a_{1},\dots,a_{n})\in R( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∈ italic_R, and ¬R𝑅\neg R¬ italic_R will denote the complement of R𝑅Ritalic_R in 𝒟Σsuperscript𝒟Σ\mathcal{D}^{\Sigma}caligraphic_D start_POSTSUPERSCRIPT roman_Σ end_POSTSUPERSCRIPT. This identification is convenient because attribution maps are naturally expressed in predicates by placing the same variables into multiple positions.

The same notational conventions apply to tuples of variables as to tuples of values, e.g. xΛsubscript𝑥Λx_{\Lambda}italic_x start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT is the subtuple of variables with the indices from ΛΛ\Lambdaroman_Λ, xΛxMconditionalsubscript𝑥Λsubscript𝑥𝑀x_{\Lambda}\mid x_{M}italic_x start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT ∣ italic_x start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT is the concatenation of variable tuples, etc. The standard logical operations, conjunction \land, disjunction \lor, etc., will be used with the usual meaning, and for Λ={i1,,ik}Λsubscript𝑖1subscript𝑖𝑘\Lambda=\{i_{1},\dots,i_{k}\}roman_Λ = { italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } the multiple quantification xi1xi1R(x)subscript𝑥subscript𝑖1subscript𝑥subscript𝑖1𝑅𝑥\exists x_{i_{1}}\dots\exists x_{i_{1}}R(x)∃ italic_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT … ∃ italic_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R ( italic_x ) will be abbreviated as xΛR(x)subscript𝑥Λ𝑅𝑥\exists x_{\Lambda}R(x)∃ italic_x start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT italic_R ( italic_x ). With these conventions and in terms of predicates, the projection is expressed simply as πΛR(xΛ)=xΛcR(x)subscript𝜋Λ𝑅subscript𝑥Λsubscript𝑥superscriptΛ𝑐𝑅𝑥\pi_{\Lambda}R\,(x_{\Lambda})=\exists x_{\Lambda^{c}}R(x)italic_π start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT italic_R ( italic_x start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT ) = ∃ italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_R ( italic_x ), and the selection as σxΛ=αR(xΛc)=R(αxΛc)subscript𝜎subscript𝑥Λ𝛼𝑅subscript𝑥superscriptΛ𝑐𝑅conditional𝛼subscript𝑥superscriptΛ𝑐\sigma_{x_{\Lambda}=\alpha}R\,(x_{\Lambda^{c}})=R(\alpha\!\mid\!x_{\Lambda^{c}})italic_σ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT = italic_α end_POSTSUBSCRIPT italic_R ( italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = italic_R ( italic_α ∣ italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ). For example, π1,3R(x,y)=tR(x,t,y)subscript𝜋13𝑅𝑥𝑦𝑡𝑅𝑥𝑡𝑦\pi_{1,3}R\,(x,y)=\exists tR(x,t,y)italic_π start_POSTSUBSCRIPT 1 , 3 end_POSTSUBSCRIPT italic_R ( italic_x , italic_y ) = ∃ italic_t italic_R ( italic_x , italic_t , italic_y ), and σx1,3=(α,α′′)R(z)=R(α,z,α′′)subscript𝜎subscript𝑥13superscript𝛼superscript𝛼′′𝑅𝑧𝑅superscript𝛼𝑧superscript𝛼′′\sigma_{x_{1,3}=(\alpha^{\prime},\alpha^{\prime\prime})}R\,(z)=R(\alpha^{% \prime},z,\alpha^{\prime\prime})italic_σ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 , 3 end_POSTSUBSCRIPT = ( italic_α start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_α start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT italic_R ( italic_z ) = italic_R ( italic_α start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_z , italic_α start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ).

2 Cartesian Products

The simplest way a relation decomposes into lower arity ones is when it is a Cartesian product of them. We need a slight generalization so that the relation does not cease to be a Cartesian product simply because its positions are permuted. This is taken care of when using attributed relations. We give the definition directly for any finite number of factors, but one can see that it comes from iterating the binary product. Recall that a partition Σ=Λ1ΛmΣsubscriptΛ1subscriptΛ𝑚\Sigma=\Lambda_{1}\cup\dots\cup\Lambda_{m}roman_Σ = roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ ⋯ ∪ roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT of a set ΛΛ\Lambdaroman_Λ is its representation as a disjoint union of subsets.

Definition 1.

Given disjoint finite subsets ΛiΣsubscriptΛ𝑖Σ\Lambda_{i}\subseteq\Sigmaroman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊆ roman_Σ and attributed relations Ri𝒟Λisubscript𝑅𝑖superscript𝒟subscriptΛ𝑖R_{i}\subseteq\mathcal{D}^{\Lambda_{i}}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, their Cartesian product R𝒟iΛi𝑅superscript𝒟subscript𝑖subscriptΛ𝑖R\subseteq\mathcal{D}^{\,\cup_{i}\Lambda_{i}}italic_R ⊆ caligraphic_D start_POSTSUPERSCRIPT ∪ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the set of concatenations of tuples from Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT:

R1××Rm:={(a1||am)|aiRi}.assignsubscript𝑅1subscript𝑅𝑚conditionalsuperscript𝑎1superscript𝑎𝑚superscript𝑎𝑖subscript𝑅𝑖R_{1}\times\dots\times R_{m}:=\{(a^{1}|\dots|\,a^{m})\,\Big{|}\,a^{i}\in R_{i}\}.italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × ⋯ × italic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT := { ( italic_a start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT | … | italic_a start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ) | italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } .

A relation R𝒟Σ𝑅superscript𝒟ΣR\in\mathcal{D}^{\Sigma}italic_R ∈ caligraphic_D start_POSTSUPERSCRIPT roman_Σ end_POSTSUPERSCRIPT is a Cartesian product over a partition Σ=Λ1ΛmΣsubscriptΛ1subscriptΛ𝑚\Sigma=\Lambda_{1}\cup\dots\cup\Lambda_{m}roman_Σ = roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ ⋯ ∪ roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT when there exist RΛi𝒟Λisuperscript𝑅subscriptΛ𝑖superscript𝒟subscriptΛ𝑖R^{\Lambda_{i}}\in\mathcal{D}^{\Lambda_{i}}italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∈ caligraphic_D start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT such that R=RΛ1××RΛm𝑅superscript𝑅subscriptΛ1superscript𝑅subscriptΛ𝑚R=R^{\Lambda_{1}}\times\dots\times R^{\Lambda_{m}}italic_R = italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT × ⋯ × italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, and RΛisuperscript𝑅subscriptΛ𝑖R^{\Lambda_{i}}italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are called its Cartesian factors. It is called degenerate when it is a Cartesian product over a partition with m>1𝑚1m>1italic_m > 1 and all ΛisubscriptΛ𝑖\Lambda_{i}\neq\emptysetroman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ ∅, and it is called 𝐥𝐥\boldsymbol{l}bold_italic_l-aric when all factors have the same arity l𝑙litalic_l.

We reiterate that, on this definition, Cartesian factors are attributed relations, and R𝑅Ritalic_R is not their Cartesian product in the ordinary sense, but rather some permutation of it. By definition, Cartesian factorization is a reduction because all RΛisuperscript𝑅subscriptΛ𝑖R^{\Lambda_{i}}italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT have smaller arity |Λi|<nsubscriptΛ𝑖𝑛|\Lambda_{i}|<n| roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | < italic_n, and when it exists, the factors are none other than the projections RΛi=πΛiRsuperscript𝑅subscriptΛ𝑖subscript𝜋subscriptΛ𝑖𝑅R^{\Lambda_{i}}=\pi_{\Lambda_{i}}Ritalic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = italic_π start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R.

The definition is more straightforward in terms of predicates, it means that the predicate of R𝑅Ritalic_R is a free conjunction of lower arity predicates:

R(x1,,xn)=RΛ1(xΛ1)RΛm(xΛm).𝑅subscript𝑥1subscript𝑥𝑛superscript𝑅subscriptΛ1subscript𝑥subscriptΛ1superscript𝑅subscriptΛ𝑚subscript𝑥subscriptΛ𝑚R(x_{1},\dots,x_{n})=R^{\Lambda_{1}}(x_{\Lambda_{1}})\land\dots\land R^{% \Lambda_{m}}(x_{\Lambda_{m}}).italic_R ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ∧ ⋯ ∧ italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) . (2)

The following is a characteristic property of Cartesian products.

Theorem 1 (Independence criterion).

A relation R𝒟Σ𝑅superscript𝒟ΣR\subseteq\mathcal{D}^{\Sigma}italic_R ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Σ end_POSTSUPERSCRIPT is a Cartesian product over a partition Σ=Λ1ΛmΣsubscriptΛ1subscriptΛ𝑚\Sigma=\Lambda_{1}\cup\dots\cup\Lambda_{m}roman_Σ = roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ ⋯ ∪ roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT if and only if the values of its tuples on ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can be chosen independently, i.e. for any collection of αiπΛiRsuperscript𝛼𝑖subscript𝜋subscriptΛ𝑖𝑅\alpha^{i}\in\pi_{\Lambda_{i}}Ritalic_α start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ italic_π start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R there exists a common aR𝑎𝑅a\in Ritalic_a ∈ italic_R such that aΛi=αisubscript𝑎subscriptΛ𝑖superscript𝛼𝑖a_{\Lambda_{i}}=\alpha^{i}italic_a start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_α start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

Proof.

Let R𝑅Ritalic_R be a Cartesian product over ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. If αiπΛiR=RΛisuperscript𝛼𝑖subscript𝜋subscriptΛ𝑖𝑅superscript𝑅subscriptΛ𝑖\alpha^{i}\in\pi_{\Lambda_{i}}R=R^{\Lambda_{i}}italic_α start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ italic_π start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R = italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT then RΛi(αi)superscript𝑅subscriptΛ𝑖superscript𝛼𝑖R^{\Lambda_{i}}(\alpha^{i})italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_α start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) holds for all i𝑖iitalic_i, and hence RΛ1(α1)RΛm(αm)superscript𝑅subscriptΛ1superscript𝛼1superscript𝑅subscriptΛ𝑚superscript𝛼𝑚R^{\Lambda_{1}}(\alpha^{1})\land\dots\land R^{\Lambda_{m}}(\alpha^{m})italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_α start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) ∧ ⋯ ∧ italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_α start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ) holds. But by (2) this means that R(a)𝑅𝑎R(a)italic_R ( italic_a ) holds for the concatenation a:=(α1||αm)assign𝑎superscript𝛼1superscript𝛼𝑚a:=(\alpha^{1}|\dots|\,\alpha^{m})italic_a := ( italic_α start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT | … | italic_α start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ). Thus, aR𝑎𝑅a\in Ritalic_a ∈ italic_R and aΛi=αisubscript𝑎subscriptΛ𝑖superscript𝛼𝑖a_{\Lambda_{i}}=\alpha^{i}italic_a start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_α start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

Conversely, suppose that R𝑅Ritalic_R satisfies the independence condition over ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. If aR𝑎𝑅a\in Ritalic_a ∈ italic_R then aΛiπΛiRsubscript𝑎subscriptΛ𝑖subscript𝜋subscriptΛ𝑖𝑅a_{\Lambda_{i}}\in\pi_{\Lambda_{i}}Ritalic_a start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ italic_π start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R by definition of projection, so RπΛ1R××πΛmR𝑅subscript𝜋subscriptΛ1𝑅subscript𝜋subscriptΛ𝑚𝑅R\subseteq\pi_{\Lambda_{1}}R\times\dots\times\pi_{\Lambda_{m}}Ritalic_R ⊆ italic_π start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R × ⋯ × italic_π start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R. On the other hand, if aπΛ1R××πΛmR𝑎subscript𝜋subscriptΛ1𝑅subscript𝜋subscriptΛ𝑚𝑅a\in\pi_{\Lambda_{1}}R\times\dots\times\pi_{\Lambda_{m}}Ritalic_a ∈ italic_π start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R × ⋯ × italic_π start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R then aΛiπΛiRsubscript𝑎subscriptΛ𝑖subscript𝜋subscriptΛ𝑖𝑅a_{\Lambda_{i}}\in\pi_{\Lambda_{i}}Ritalic_a start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ italic_π start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R, and, by the independence condition, there is a common bR𝑏𝑅b\in Ritalic_b ∈ italic_R with bΛi=aΛisubscript𝑏subscriptΛ𝑖subscript𝑎subscriptΛ𝑖b_{\Lambda_{i}}=a_{\Lambda_{i}}italic_b start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_a start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Since ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT form a partition we have a=bR𝑎𝑏𝑅a=b\in Ritalic_a = italic_b ∈ italic_R. Thus, πΛ1R××πΛmRRsubscript𝜋subscriptΛ1𝑅subscript𝜋subscriptΛ𝑚𝑅𝑅\pi_{\Lambda_{1}}R\times\dots\times\pi_{\Lambda_{m}}R\subseteq Ritalic_π start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R × ⋯ × italic_π start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R ⊆ italic_R and they are equal. ∎

Note that if we assign equal probability to all tuples of R𝑅Ritalic_R degenerate over a partition ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT then aΛisubscript𝑎subscriptΛ𝑖a_{\Lambda_{i}}italic_a start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT will be statistically independent random vectors on this sample space [29].

When testing for degeneracy it is sufficient to consider bipartitions only. Indeed, if R𝑅Ritalic_R can be split into several factors then we can always group them into just two non-trivial factors, say, the first one and the product of the rest.

Example 1.

The independence criterion can be used directly to establish degeneracy or non-degeneracy of some relations. Clearly, all unary relations are non-degenerate on any domain. On domains 𝒟𝒟\mathcal{D}caligraphic_D with |𝒟|2𝒟2|\mathcal{D}|\geq 2| caligraphic_D | ≥ 2 the identity relations Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are non-degenerate for n2𝑛2n\geq 2italic_n ≥ 2. Indeed, constant tuples (α,,α),(β,,β)In𝛼𝛼𝛽𝛽subscript𝐼𝑛(\alpha,\dots,\alpha),(\beta,\dots,\beta)\in I_{n}( italic_α , … , italic_α ) , ( italic_β , … , italic_β ) ∈ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for any α,β𝒟𝛼𝛽𝒟\alpha,\beta\in\mathcal{D}italic_α , italic_β ∈ caligraphic_D. If Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT were degenerate, we could, by independence, assign constant α𝛼\alphaitalic_α values to some ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and constant β𝛽\betaitalic_β values to others. But if αβ𝛼𝛽\alpha\neq\betaitalic_α ≠ italic_β their concatenation will not be in Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, contradiction.

Similarly, for |𝒟|n𝒟𝑛|\mathcal{D}|\geq n| caligraphic_D | ≥ italic_n the diversity relations Dnsubscript𝐷𝑛D_{n}italic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are non-degenerate (Dn=nsubscript𝐷𝑛subscript𝑛D_{n}=\emptyset_{n}italic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ∅ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for |𝒟|<n𝒟𝑛|\mathcal{D}|<n| caligraphic_D | < italic_n). Indeed, any diverse (with no equal values) subtuple is in the projection of Dnsubscript𝐷𝑛D_{n}italic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT to any proper subset of positions. But subtuples over disjoint subsets of positions can share values, and their concatenation will not be in Dnsubscript𝐷𝑛D_{n}italic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

The following simple proposition gives two sufficient conditions of non-degeneracy that are easier to check in examples.

Theorem 2.

Suppose one of the following conditions holds for a relation R𝒟Σ𝑅superscript𝒟ΣR\subseteq\mathcal{D}^{\Sigma}italic_R ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Σ end_POSTSUPERSCRIPT.

(i) R𝑅Ritalic_R is not universal, but for any non-empty proper subset ΛΣΛΣ\emptyset\subset\Lambda\subset\Sigma∅ ⊂ roman_Λ ⊂ roman_Σ

we have πΛR=𝒟Λsubscript𝜋Λ𝑅superscript𝒟Λ\pi_{\Lambda}R=\mathcal{D}^{\Lambda}italic_π start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT italic_R = caligraphic_D start_POSTSUPERSCRIPT roman_Λ end_POSTSUPERSCRIPT.

(ii) ¬R𝑅\neg R¬ italic_R is not empty, but for any iΣ𝑖Σi\in\Sigmaitalic_i ∈ roman_Σ we have πi(¬R)𝒟subscript𝜋𝑖𝑅𝒟\pi_{i}(\neg R)\neq\mathcal{D}italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ¬ italic_R ) ≠ caligraphic_D.

Then R𝑅Ritalic_R is non-degenerate.

Proof.

(i) According to (2), R𝑅Ritalic_R is a conjunction of πΛiRsubscript𝜋subscriptΛ𝑖𝑅\pi_{\Lambda_{i}}Ritalic_π start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R with proper subsets ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Since each projection is universal, by assumption, so must be their conjunction, contradiction.

(ii) Let δ𝒟𝛿𝒟\delta\in\mathcal{D}italic_δ ∈ caligraphic_D be arbitrary. If R𝑅Ritalic_R is a product negating (2) we obtain, by the de Morgan law:

¬R(x1,,xn)=¬RΛ1(xΛ1)¬RΛm(xΛm).𝑅subscript𝑥1subscript𝑥𝑛superscript𝑅subscriptΛ1subscript𝑥subscriptΛ1superscript𝑅subscriptΛ𝑚subscript𝑥subscriptΛ𝑚\neg R(x_{1},\dots,x_{n})=\neg R^{\Lambda_{1}}(x_{\Lambda_{1}})\lor\dots\lor% \neg R^{\Lambda_{m}}(x_{\Lambda_{m}}).¬ italic_R ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ¬ italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ∨ ⋯ ∨ ¬ italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) .

Since ¬R𝑅\neg R¬ italic_R is not empty so is one of the disjuncts, say ¬RΛ1superscript𝑅subscriptΛ1\neg R^{\Lambda_{1}}¬ italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Pick some α¬RΛ1𝛼superscript𝑅subscriptΛ1\alpha\in\neg R^{\Lambda_{1}}italic_α ∈ ¬ italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and some iΛ1𝑖subscriptΛ1i\not\in\Lambda_{1}italic_i ∉ roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, which exists because Λ1subscriptΛ1\Lambda_{1}roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is proper, and set ai:=δassignsubscript𝑎𝑖𝛿a_{i}:=\deltaitalic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := italic_δ, aΛ1:=αassignsubscript𝑎subscriptΛ1𝛼a_{\Lambda_{1}}:=\alphaitalic_a start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT := italic_α, then assign values for ajsubscript𝑎𝑗a_{j}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT with jΛ1{i}𝑗subscriptΛ1𝑖j\not\in\Lambda_{1}\cup\{i\}italic_j ∉ roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ { italic_i } arbitrarily. By construction, ¬RΛ1(aΛ1)superscript𝑅subscriptΛ1subscript𝑎subscriptΛ1\neg R^{\Lambda_{1}}(a_{\Lambda_{1}})¬ italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) holds and hence so does ¬R(a)𝑅𝑎\neg R(a)¬ italic_R ( italic_a ) because ¬R𝑅\neg R¬ italic_R is a disjunction of ¬RΛisuperscript𝑅subscriptΛ𝑖\neg R^{\Lambda_{i}}¬ italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. But δ𝛿\deltaitalic_δ was arbitrary, and so πi(¬R)=𝒟subscript𝜋𝑖𝑅𝒟\pi_{i}(\neg R)=\mathcal{D}italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ¬ italic_R ) = caligraphic_D, contrary to the assumption. ∎

Example 2.

By Theorem 2 (i), the non-identity relations ¬Insubscript𝐼𝑛\neg I_{n}¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT with any n2𝑛2n\geq 2italic_n ≥ 2 are non-degenerate (they trivially are for n=1𝑛1n=1italic_n = 1) when |𝒟|2𝒟2|\mathcal{D}|\geq 2| caligraphic_D | ≥ 2. Indeed, πΛ(¬In)subscript𝜋Λsubscript𝐼𝑛\pi_{\Lambda}(\neg I_{n})italic_π start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT ( ¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) is universal for any non-empty proper ΛΛ\Lambdaroman_Λ because any tuple constant over ΛΛ\Lambdaroman_Λ can be complemented by assigning a different value on iΛ𝑖Λi\not\in\Lambdaitalic_i ∉ roman_Λ, and non-constant tuples can be complemented arbitrarily, with the result in ¬Insubscript𝐼𝑛\neg I_{n}¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT in both cases.

The above reasoning no longer applies to R:=¬In𝒜assign𝑅superscriptsubscript𝐼𝑛𝒜R:=\neg I_{n}^{\mathcal{A}}italic_R := ¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_A end_POSTSUPERSCRIPT for 𝒜𝒜\mathcal{A}caligraphic_A a non-empty proper subset of 𝒟𝒟\mathcal{D}caligraphic_D. However, πi(¬R)=πi(In𝒜)=𝒜𝒟subscript𝜋𝑖𝑅subscript𝜋𝑖superscriptsubscript𝐼𝑛𝒜𝒜𝒟\pi_{i}(\neg R)=\pi_{i}(I_{n}^{\mathcal{A}})=\mathcal{A}\neq\mathcal{D}italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ¬ italic_R ) = italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_A end_POSTSUPERSCRIPT ) = caligraphic_A ≠ caligraphic_D for any i𝑖iitalic_i by assumption, and ¬In𝒜superscriptsubscript𝐼𝑛𝒜\neg I_{n}^{\mathcal{A}}¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_A end_POSTSUPERSCRIPT is non-degenerate by Theorem 2 (ii). In particular, by picking 𝒜={α}𝒜𝛼\mathcal{A}=\{\alpha\}caligraphic_A = { italic_α } we see that Un\{(α,,α)}\subscript𝑈𝑛𝛼𝛼U_{n}\backslash\{(\alpha,\dots,\alpha)\}italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT \ { ( italic_α , … , italic_α ) } is non-degenerate for any α𝒟𝛼𝒟\alpha\in\mathcal{D}italic_α ∈ caligraphic_D when |𝒟|2𝒟2|\mathcal{D}|\geq 2| caligraphic_D | ≥ 2.

The above examples already show that there are non-degenerate relations of arbitrarily high arity. Therefore, the situation with Cartesian factorization is similar to factorization of polynomials over {\mathbb{Q}}blackboard_Q with irreducible polynomials of arbitrarily high degrees. The next theorem shows, by a counting argument, that, as far as reductive power is concerned, the situation is even worse – ‘almost all’ relations are non-degenerate.

Theorem 3.

The share of degenerate n𝑛nitalic_n-ary relations among all such relations on a domain 𝒟𝒟\mathcal{D}caligraphic_D asymptotically vanishes when |𝒟|2𝒟2|\mathcal{D}|\geq 2| caligraphic_D | ≥ 2 and n𝑛n\to\inftyitalic_n → ∞, or when n2𝑛2n\geq 2italic_n ≥ 2 and |𝒟|𝒟|\mathcal{D}|\to\infty| caligraphic_D | → ∞.

Proof.

The set of all n𝑛nitalic_n-ary relations on 𝒟𝒟\mathcal{D}caligraphic_D is 𝒫(𝒟n)𝒫superscript𝒟𝑛\mathcal{P}(\mathcal{D}^{n})caligraphic_P ( caligraphic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ), so their total number is |𝒫(𝒟n)|=2|𝒟|n𝒫superscript𝒟𝑛superscript2superscript𝒟𝑛|\mathcal{P}(\mathcal{D}^{n})|=2^{|\mathcal{D}|^{n}}| caligraphic_P ( caligraphic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) | = 2 start_POSTSUPERSCRIPT | caligraphic_D | start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. For any degenerate relation there is a non-empty proper subset ΛnΛsubscript𝑛\Lambda\subset{\mathbb{N}}_{n}roman_Λ ⊂ blackboard_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT such that it is a Cartesian product over the bipartition ΛΛcΛsuperscriptΛ𝑐\Lambda\cup\Lambda^{c}roman_Λ ∪ roman_Λ start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT, and every bipartition is counted twice by ΛΛ\Lambdaroman_Λ due to the symmetry between ΛΛ\Lambdaroman_Λ and ΛcsuperscriptΛ𝑐\Lambda^{c}roman_Λ start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT. There are (nk)binomial𝑛𝑘\binom{n}{k}( FRACOP start_ARG italic_n end_ARG start_ARG italic_k end_ARG ) subsets of nsubscript𝑛{\mathbb{N}}_{n}blackboard_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT with |Λ|=kΛ𝑘|\Lambda|=k| roman_Λ | = italic_k, and we can assign 2|𝒟||Λ|=2|𝒟|ksuperscript2superscript𝒟Λsuperscript2superscript𝒟𝑘2^{|\mathcal{D}|^{|\Lambda|}}=2^{|\mathcal{D}|^{k}}2 start_POSTSUPERSCRIPT | caligraphic_D | start_POSTSUPERSCRIPT | roman_Λ | end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT = 2 start_POSTSUPERSCRIPT | caligraphic_D | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT and 2|𝒟||Λc|=2|𝒟|nksuperscript2superscript𝒟superscriptΛ𝑐superscript2superscript𝒟𝑛𝑘2^{|\mathcal{D}|^{|\Lambda^{c}|}}=2^{|\mathcal{D}|^{n-k}}2 start_POSTSUPERSCRIPT | caligraphic_D | start_POSTSUPERSCRIPT | roman_Λ start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT = 2 start_POSTSUPERSCRIPT | caligraphic_D | start_POSTSUPERSCRIPT italic_n - italic_k end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT different relations to each of the Cartesian factors. Of course, some of the products obtained in this way may coincide, so we only get an upper bound for the total number Ndegsubscript𝑁𝑑𝑒𝑔N_{deg}italic_N start_POSTSUBSCRIPT italic_d italic_e italic_g end_POSTSUBSCRIPT of degenerate relations:

Ndeg12k=1n1(nk) 2|𝒟|k+|𝒟|nk.subscript𝑁𝑑𝑒𝑔12superscriptsubscript𝑘1𝑛1binomial𝑛𝑘superscript2superscript𝒟𝑘superscript𝒟𝑛𝑘N_{deg}\leq\frac{1}{2}\,\sum_{k=1}^{n-1}\binom{n}{k}\,2^{|\mathcal{D}|^{k}+|% \mathcal{D}|^{n-k}}.italic_N start_POSTSUBSCRIPT italic_d italic_e italic_g end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( FRACOP start_ARG italic_n end_ARG start_ARG italic_k end_ARG ) 2 start_POSTSUPERSCRIPT | caligraphic_D | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + | caligraphic_D | start_POSTSUPERSCRIPT italic_n - italic_k end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT .

By calculus, the function |𝒟|k+|𝒟|nksuperscript𝒟𝑘superscript𝒟𝑛𝑘|\mathcal{D}|^{k}+|\mathcal{D}|^{n-k}| caligraphic_D | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + | caligraphic_D | start_POSTSUPERSCRIPT italic_n - italic_k end_POSTSUPERSCRIPT on 1kn11𝑘𝑛11\leq k\leq n-11 ≤ italic_k ≤ italic_n - 1 takes its maximal values at the endpoints when |𝒟|2𝒟2|\mathcal{D}|\geq 2| caligraphic_D | ≥ 2, so |𝒟|k+|𝒟|nk|𝒟|+|𝒟|n1superscript𝒟𝑘superscript𝒟𝑛𝑘𝒟superscript𝒟𝑛1|\mathcal{D}|^{k}+|\mathcal{D}|^{n-k}\leq|\mathcal{D}|+|\mathcal{D}|^{n-1}| caligraphic_D | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + | caligraphic_D | start_POSTSUPERSCRIPT italic_n - italic_k end_POSTSUPERSCRIPT ≤ | caligraphic_D | + | caligraphic_D | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT for all k𝑘kitalic_k. Therefore,

Ndeg2|𝒟|+|𝒟|n11k=1n1(nk)<2n1+|𝒟|+|𝒟|n1,subscript𝑁𝑑𝑒𝑔superscript2𝒟superscript𝒟𝑛11superscriptsubscript𝑘1𝑛1binomial𝑛𝑘superscript2𝑛1𝒟superscript𝒟𝑛1N_{deg}\leq 2^{|\mathcal{D}|+|\mathcal{D}|^{n-1}-1}\sum_{k=1}^{n-1}\binom{n}{k% }<2^{n-1+|\mathcal{D}|+|\mathcal{D}|^{n-1}},italic_N start_POSTSUBSCRIPT italic_d italic_e italic_g end_POSTSUBSCRIPT ≤ 2 start_POSTSUPERSCRIPT | caligraphic_D | + | caligraphic_D | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( FRACOP start_ARG italic_n end_ARG start_ARG italic_k end_ARG ) < 2 start_POSTSUPERSCRIPT italic_n - 1 + | caligraphic_D | + | caligraphic_D | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT , (3)

because k=1n1(nk)=2n2<2nsuperscriptsubscript𝑘1𝑛1binomial𝑛𝑘superscript2𝑛2superscript2𝑛\sum_{k=1}^{n-1}\binom{n}{k}=2^{n}-2<2^{n}∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( FRACOP start_ARG italic_n end_ARG start_ARG italic_k end_ARG ) = 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - 2 < 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. The conclusions now follow by applying calculus to the fraction 2n1+|𝒟|+|𝒟|n12|𝒟|nsuperscript2𝑛1𝒟superscript𝒟𝑛1superscript2superscript𝒟𝑛\frac{2^{n-1+|\mathcal{D}|+|\mathcal{D}|^{n-1}}}{2^{|\mathcal{D}|^{n}}}divide start_ARG 2 start_POSTSUPERSCRIPT italic_n - 1 + | caligraphic_D | + | caligraphic_D | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG 2 start_POSTSUPERSCRIPT | caligraphic_D | start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG. ∎

3 Joins and dependencies

As far as analysis of relations is concerned, Cartesian products do not take us very far. Asymptotically, in either domain size or arity, almost all relations do not factor into them. A natural move is to consider more general decompositions. In terms of predicates, Cartesian products correspond to free conjunctions, so let us allow conjunctions that are not necessarily free, i.e. some variables in them are identified. This leads to the notion of natural join, or join for short, introduced by Codd in the context of relational databases [12]. Its binary version, Boolean product, goes back to the 19th century [7]. Our definition of join parallels the definition of Cartesian product, the only difference is that the relation schemes no longer have to form a partition of ΣΣ\Sigmaroman_Σ, only a cover.

Definition 2.

Given finite subsets ΛiΣsubscriptΛ𝑖Σ\Lambda_{i}\subseteq\Sigmaroman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊆ roman_Σ and attributed relations Ri𝒟Λisubscript𝑅𝑖superscript𝒟subscriptΛ𝑖R_{i}\in\mathcal{D}^{\Lambda_{i}}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_D start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, their (natural) join R𝒟iΛi𝑅superscript𝒟subscript𝑖subscriptΛ𝑖R\in\mathcal{D}^{\,\cup_{i}\Lambda_{i}}italic_R ∈ caligraphic_D start_POSTSUPERSCRIPT ∪ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is defined as the set of concatenations of tuples from Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT that match on overlaps of ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT:

R1Rm:={(a1||am)|aiRi,aΛiΛji=aΛiΛjj}.joinsubscript𝑅1joinsubscript𝑅𝑚assignformulae-sequenceconditionalsuperscript𝑎1superscript𝑎𝑚superscript𝑎𝑖subscript𝑅𝑖subscriptsuperscript𝑎𝑖subscriptΛ𝑖subscriptΛ𝑗subscriptsuperscript𝑎𝑗subscriptΛ𝑖subscriptΛ𝑗R_{1}\Join\dots\Join R_{m}:=\{(a^{1}|\dots|\,a^{m})\,\Big{|}\,a^{i}\in R_{i},% \,a^{i}_{\Lambda_{i}\cap\Lambda_{j}}\!=a^{j}_{\Lambda_{i}\cap\Lambda_{j}}\}.italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⨝ ⋯ ⨝ italic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT := { ( italic_a start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT | … | italic_a start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ) | italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∩ roman_Λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_a start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∩ roman_Λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT } .

A relation R𝒟Σ𝑅superscript𝒟ΣR\in\mathcal{D}^{\Sigma}italic_R ∈ caligraphic_D start_POSTSUPERSCRIPT roman_Σ end_POSTSUPERSCRIPT is a (natural) join over a cover Σ=Λ1ΛmΣsubscriptΛ1subscriptΛ𝑚\Sigma=\Lambda_{1}\cup\dots\cup\Lambda_{m}roman_Σ = roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ ⋯ ∪ roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT when there exist RΛi𝒟Λisuperscript𝑅subscriptΛ𝑖superscript𝒟subscriptΛ𝑖R^{\Lambda_{i}}\in\mathcal{D}^{\Lambda_{i}}italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∈ caligraphic_D start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT such that R=RΛ1RΛm𝑅superscript𝑅subscriptΛ1joinjoinsuperscript𝑅subscriptΛ𝑚R=R^{\Lambda_{1}}\Join\dots\Join R^{\Lambda_{m}}italic_R = italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⨝ ⋯ ⨝ italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, and RΛisuperscript𝑅subscriptΛ𝑖R^{\Lambda_{i}}italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are called its join factors. It is called join reducible when 0<|Λi|<|Σ|0subscriptΛ𝑖Σ0<|\Lambda_{i}|<|\Sigma|0 < | roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | < | roman_Σ |.

As the notation indicates, join of multiple factors is obtained by iterating Boolean product join\Join. On attributed relations Boolean product is commutative and associative, just like Cartesian product, because ordering of positions does not get in the way.

In terms of predicates, joins are just conjunctions of predicates over subsets of variables with some variables shared among them:

R(x1,,xn)=RΛ1(xΛ1)RΛm(xΛm).𝑅subscript𝑥1subscript𝑥𝑛superscript𝑅subscriptΛ1subscript𝑥subscriptΛ1superscript𝑅subscriptΛ𝑚subscript𝑥subscriptΛ𝑚R(x_{1},\dots,x_{n})=R^{\Lambda_{1}}(x_{\Lambda_{1}})\land\dots\land R^{% \Lambda_{m}}(x_{\Lambda_{m}}).italic_R ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ∧ ⋯ ∧ italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) . (4)

This formula looks exactly like the free conjunction (2), except now ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT may overlap, e.g. R1,3(x1,x3)R1,2,4(x1,x2,x4)R3,4(x3,x4)superscript𝑅13subscript𝑥1subscript𝑥3superscript𝑅124subscript𝑥1subscript𝑥2subscript𝑥4superscript𝑅34subscript𝑥3subscript𝑥4R^{1,3}(x_{1},x_{3})\land R^{1,2,4}(x_{1},x_{2},x_{4})\land R^{3,4}(x_{3},x_{4})italic_R start_POSTSUPERSCRIPT 1 , 3 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ italic_R start_POSTSUPERSCRIPT 1 , 2 , 4 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) ∧ italic_R start_POSTSUPERSCRIPT 3 , 4 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) is a join. In mathematical logic, joins of predicates are called quantifier-free primitive positive formulas, and join closures are called weak partial co-clones in CSP literature [20, 22].

Note that (4) no longer implies that RΛi=πΛiRsuperscript𝑅subscriptΛ𝑖subscript𝜋subscriptΛ𝑖𝑅R^{\Lambda_{i}}=\pi_{\Lambda_{i}}Ritalic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = italic_π start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R, as it did for Cartesian products. This is because some tuples in RΛisuperscript𝑅subscriptΛ𝑖R^{\Lambda_{i}}italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT may not have companion tuples in other RΛjsuperscript𝑅subscriptΛ𝑗R^{\Lambda_{j}}italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT consistent with them on overlaps, and do not make it into the join. However, (4) remains valid if we replace RΛisuperscript𝑅subscriptΛ𝑖R^{\Lambda_{i}}italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT by πΛiRsubscript𝜋subscriptΛ𝑖𝑅\pi_{\Lambda_{i}}Ritalic_π start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R because all and only joinable tuples are present in the projections. In this weakened sense, joins share the constructive property of Cartesian products – their (minimized) factors can be recovered as projections of the join itself, as pointed out by Rissanen [34]. When the join factors contain only joinable tuples they are said to join completely [28, 2.4].

There is also an analog of the independence criterion for Cartesian products [24].

Theorem 4 (Join criterion).

A relation R𝒟Σ𝑅superscript𝒟ΣR\subseteq\mathcal{D}^{\Sigma}italic_R ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Σ end_POSTSUPERSCRIPT is a join over a cover Σ=Λ1ΛmΣsubscriptΛ1subscriptΛ𝑚\Sigma=\Lambda_{1}\cup\dots\cup\Lambda_{m}roman_Σ = roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ ⋯ ∪ roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT if and only if the values of its tuples on ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can be chosen independently up to consistency on overlaps, i.e. for any collection of αiπΛiRsuperscript𝛼𝑖subscript𝜋subscriptΛ𝑖𝑅\alpha^{i}\in\pi_{\Lambda_{i}}Ritalic_α start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ italic_π start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R with αΛiΛji=αΛiΛjjsubscriptsuperscript𝛼𝑖subscriptΛ𝑖subscriptΛ𝑗subscriptsuperscript𝛼𝑗subscriptΛ𝑖subscriptΛ𝑗\alpha^{i}_{\Lambda_{i}\cap\Lambda_{j}}\!=\alpha^{j}_{\Lambda_{i}\cap\Lambda_{% j}}italic_α start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∩ roman_Λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_α start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∩ roman_Λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT there is common aR𝑎𝑅a\in Ritalic_a ∈ italic_R such that aΛi=αisubscript𝑎subscriptΛ𝑖superscript𝛼𝑖a_{\Lambda_{i}}=\alpha^{i}italic_a start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_α start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT.

The proof is analogous to the proof of Theorem 1 and we omit it.

Unfortunately, this criterion is not nearly as useful in detecting either join reducibility or irreducibility. More constructive reducibility conditions are usually formulated in terms of data dependencies [3, 16, 24, 28, 30], the simplest of which is functional.

Definition 3.

Let Λ,MΣΛMΣ\Lambda,\textup{M}\subseteq\Sigmaroman_Λ , M ⊆ roman_Σ. We say that a relation R𝒟Σ𝑅superscript𝒟ΣR\subseteq\mathcal{D}^{\Sigma}italic_R ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Σ end_POSTSUPERSCRIPT has a functional dependency ΛMΛM\Lambda\to\textup{M}roman_Λ → M when its ΛΛ\Lambdaroman_Λ values determine its M𝑀Mitalic_M values, i.e. for any a,bR𝑎𝑏𝑅a,b\in Ritalic_a , italic_b ∈ italic_R if aΛ=bΛsubscript𝑎Λsubscript𝑏Λa_{\Lambda}=b_{\Lambda}italic_a start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT = italic_b start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT then aM=bMsubscript𝑎Msubscript𝑏Ma_{\textup{M}}=b_{\textup{M}}italic_a start_POSTSUBSCRIPT M end_POSTSUBSCRIPT = italic_b start_POSTSUBSCRIPT M end_POSTSUBSCRIPT. When KKcKsuperscriptK𝑐\textup{K}\to\textup{K}^{c}K → K start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT for some KΣKΣ\textup{K}\subset\SigmaK ⊂ roman_Σ then K is called a key of R𝑅Ritalic_R, and more precisely a 𝐤-key𝐤-key\boldsymbol{k}\textbf{-key}bold_italic_k -key, where k=|K|𝑘Kk=|\textup{K}|italic_k = | K |.

In databases, key column(s) are those whose entries determine the entire record, for example, the ID column. This situation was prototypical for choosing the terminology, the key column(s) code objects the relation is about, and the rest list their attributes. What matters to us is that availability of keys ensures join reducibility: instead of relating all attributes to their objects in a single relation we can relate them one at a time. The idea goes back to the work of Peirce and is in close affinity to his “hypostatic abstraction” (see [9, 21] and Section 4).

Theorem 5.

Any R𝒟Σ𝑅superscript𝒟ΣR\subseteq\mathcal{D}^{\Sigma}italic_R ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Σ end_POSTSUPERSCRIPT with a k𝑘kitalic_k-key K={i1,,ik}ΣKsubscript𝑖1subscript𝑖𝑘Σ\textup{K}=\{i_{1},\dots,i_{k}\}\subseteq\SigmaK = { italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } ⊆ roman_Σ decomposes into a join of (k+1)𝑘1(k+1)( italic_k + 1 )-aries as

R=iKπK{i}R=Rik+1Rin,𝑅subscriptjoin𝑖Ksubscript𝜋K𝑖𝑅subscript𝑅subscript𝑖𝑘1joinjoinsubscript𝑅subscript𝑖𝑛R=\operatorname*{\scalebox{1.8}{$\Join$}}\limits_{i\not\in\textup{K}}\pi_{% \textup{K}\cup\{i\}}R=R_{i_{k+1}}\!\Join\dots\Join R_{i_{n}},italic_R = ⨝ start_POSTSUBSCRIPT italic_i ∉ K end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT K ∪ { italic_i } end_POSTSUBSCRIPT italic_R = italic_R start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⨝ ⋯ ⨝ italic_R start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT , (5)

where Ri:=πK{i}Rassignsubscript𝑅𝑖subscript𝜋K𝑖𝑅R_{i}:=\pi_{\textup{K}\cup\{i\}}Ritalic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := italic_π start_POSTSUBSCRIPT K ∪ { italic_i } end_POSTSUBSCRIPT italic_R. In the predicate form,

R(x1,,xn)=Rik+1(xi1,,xik,xik+1)Rin(xi1,,xik,xin)𝑅subscript𝑥1subscript𝑥𝑛subscript𝑅subscript𝑖𝑘1subscript𝑥subscript𝑖1subscript𝑥subscript𝑖𝑘subscript𝑥subscript𝑖𝑘1subscript𝑅subscript𝑖𝑛subscript𝑥subscript𝑖1subscript𝑥subscript𝑖𝑘subscript𝑥subscript𝑖𝑛R(x_{1},\dots,x_{n})=R_{i_{k+1}}(x_{i_{1}},\dots,x_{i_{k}},x_{i_{k+1}})\land% \dots\land R_{i_{n}}(x_{i_{1}},\dots,x_{i_{k}},x_{i_{n}})italic_R ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = italic_R start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ∧ ⋯ ∧ italic_R start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) (6)

In particular, R𝑅Ritalic_R is join reducible when kn2𝑘𝑛2k\leq n-2italic_k ≤ italic_n - 2, and join reducible to binaries when k=1𝑘1k=1italic_k = 1.

Proof.

Without loss of generality, we may rename the attributes into integers and assume Σ=nΣsubscript𝑛\Sigma={\mathbb{N}}_{n}roman_Σ = blackboard_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, K={1,,k}K1𝑘\textup{K}=\{1,\dots,k\}K = { 1 , … , italic_k }. Suppose aR𝑎𝑅a\in Ritalic_a ∈ italic_R, then for any ik+1𝑖𝑘1i\geq k+1italic_i ≥ italic_k + 1 the subtuple (a1,,ak,ai)Ri=πK{i}Rsubscript𝑎1subscript𝑎𝑘subscript𝑎𝑖subscript𝑅𝑖subscript𝜋K𝑖𝑅(a_{1},\dots,a_{k},a_{i})\in R_{i}=\pi_{\textup{K}\cup\{i\}}R( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_π start_POSTSUBSCRIPT K ∪ { italic_i } end_POSTSUBSCRIPT italic_R because it is the projection of a𝑎aitalic_a to K{i}K𝑖\textup{K}\cup\{i\}K ∪ { italic_i }. Hence aRk+1Rn𝑎subscript𝑅𝑘1joinjoinsubscript𝑅𝑛a\in R_{k+1}\!\Join\dots\Join R_{n}italic_a ∈ italic_R start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ⨝ ⋯ ⨝ italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

Now suppose aRk+1Rn𝑎subscript𝑅𝑘1joinjoinsubscript𝑅𝑛a\in R_{k+1}\!\Join\dots\Join R_{n}italic_a ∈ italic_R start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ⨝ ⋯ ⨝ italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, i.e. (a1,,ak,ai)Risubscript𝑎1subscript𝑎𝑘subscript𝑎𝑖subscript𝑅𝑖(a_{1},\dots,a_{k},a_{i})\in R_{i}( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for all ik+1𝑖𝑘1i\geq k+1italic_i ≥ italic_k + 1. By definition of Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, there must be tuples biRsuperscript𝑏𝑖𝑅b^{i}\in Ritalic_b start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ italic_R with bji=ajsubscriptsuperscript𝑏𝑖𝑗subscript𝑎𝑗b^{i}_{j}=a_{j}italic_b start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for jk𝑗𝑘j\leq kitalic_j ≤ italic_k and bii=aisuperscriptsubscript𝑏𝑖𝑖subscript𝑎𝑖b_{i}^{i}=a_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. But the first k𝑘kitalic_k positions are a key, and all bisuperscript𝑏𝑖b^{i}italic_b start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT coincide on them with a𝑎aitalic_a, therefore, a=b1R𝑎superscript𝑏1𝑅a=b^{1}\in Ritalic_a = italic_b start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∈ italic_R. ∎

Example 3.

In the identity relation Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT every position is a 1111-key. Taking K={1}K1\textup{K}=\{1\}K = { 1 } and applying (6), we get the familiar reduction to binaries

In(x1,,xn)=I2(x1,xn)I2(xn1,xn)=(x1=xn)(xn1=xn).subscript𝐼𝑛subscript𝑥1subscript𝑥𝑛subscript𝐼2subscript𝑥1subscript𝑥𝑛subscript𝐼2subscript𝑥𝑛1subscript𝑥𝑛subscript𝑥1subscript𝑥𝑛subscript𝑥𝑛1subscript𝑥𝑛I_{n}(x_{1},\dots,x_{n})=I_{2}(x_{1},x_{n})\land\dots\land I_{2}(x_{n-1},x_{n}% )=\,(x_{1}=x_{n})\land\dots\land(x_{n-1}=x_{n}).italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∧ ⋯ ∧ italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∧ ⋯ ∧ ( italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) .
Example 4.

The division with remainder quaternary Div(n,d,q,r)Div𝑛𝑑𝑞𝑟\textup{Div}(n,d,q,r)Div ( italic_n , italic_d , italic_q , italic_r ) (n=dq+r𝑛𝑑𝑞𝑟n=dq+ritalic_n = italic_d italic_q + italic_r with n𝑛nitalic_n the dividend, d0𝑑0d\neq 0italic_d ≠ 0 the divisor, q𝑞qitalic_q the quotient and 0rd10𝑟𝑑10\leq r\leq d-10 ≤ italic_r ≤ italic_d - 1 the remainder) on {0}0{\mathbb{N}}\cup\{0\}blackboard_N ∪ { 0 } has a 2222-key consisting of the dividend and the divisor. Therefore, it reduces to the join of two ternaries: Div1,2,3(n,d,q)=(q=nd)superscriptDiv123𝑛𝑑𝑞𝑞𝑛𝑑\mathrm{Div}^{1,2,3}(n,d,q)=\,\big{(}q=\lfloor\frac{n}{d}\rfloor\big{)}roman_Div start_POSTSUPERSCRIPT 1 , 2 , 3 end_POSTSUPERSCRIPT ( italic_n , italic_d , italic_q ) = ( italic_q = ⌊ divide start_ARG italic_n end_ARG start_ARG italic_d end_ARG ⌋ ), where \lfloor\cdot\rfloor⌊ ⋅ ⌋ is the floor function, and Div1,2,4(n,d,r)=(nr(modd))superscriptDiv124𝑛𝑑𝑟𝑛annotated𝑟𝑝𝑚𝑜𝑑𝑑\textup{Div}^{1,2,4}(n,d,r)=\big{(}n\equiv r\!\pmod{d}\big{)}Div start_POSTSUPERSCRIPT 1 , 2 , 4 end_POSTSUPERSCRIPT ( italic_n , italic_d , italic_r ) = ( italic_n ≡ italic_r start_MODIFIER ( roman_mod start_ARG italic_d end_ARG ) end_MODIFIER ).

Functional dependency is not necessary for join reducibility, not even by join decompositions of the special form (5)-(6). This is fortuitous because it is a rare occurrence, just like its polar opposite, Cartesian independence of values, and we could not join reduce many relations if it was required.

Example 5.

Consider the ternary R𝑅Ritalic_R on 𝒟={α,β}𝒟𝛼𝛽\mathcal{D}=\{\alpha,\beta\}caligraphic_D = { italic_α , italic_β } given by the table below.

R𝑅Ritalic_R = α𝛼\alphaitalic_α α𝛼\alphaitalic_α α𝛼\alphaitalic_α α𝛼\alphaitalic_α α𝛼\alphaitalic_α β𝛽\betaitalic_β α𝛼\alphaitalic_α β𝛽\betaitalic_β α𝛼\alphaitalic_α α𝛼\alphaitalic_α β𝛽\betaitalic_β β𝛽\betaitalic_β β𝛽\betaitalic_β α𝛼\alphaitalic_α β𝛽\betaitalic_β              R1,2superscript𝑅12R^{1,2}italic_R start_POSTSUPERSCRIPT 1 , 2 end_POSTSUPERSCRIPT = α𝛼\alphaitalic_α α𝛼\alphaitalic_α α𝛼\alphaitalic_α β𝛽\betaitalic_β β𝛽\betaitalic_β α𝛼\alphaitalic_α     R1,3superscript𝑅13R^{1,3}italic_R start_POSTSUPERSCRIPT 1 , 3 end_POSTSUPERSCRIPT = α𝛼\alphaitalic_α α𝛼\alphaitalic_α α𝛼\alphaitalic_α β𝛽\betaitalic_β β𝛽\betaitalic_β β𝛽\betaitalic_β

Since α𝛼\alphaitalic_α is repeated in the first column the latter cannot be a key. Nonetheless, R𝑅Ritalic_R is a join of the form (6): R(x,y,z)=R1,2(x,y)R1,3(x,z)𝑅𝑥𝑦𝑧superscript𝑅12𝑥𝑦superscript𝑅13𝑥𝑧R(x,y,z)=R^{1,2}(x,y)\land R^{1,3}(x,z)italic_R ( italic_x , italic_y , italic_z ) = italic_R start_POSTSUPERSCRIPT 1 , 2 end_POSTSUPERSCRIPT ( italic_x , italic_y ) ∧ italic_R start_POSTSUPERSCRIPT 1 , 3 end_POSTSUPERSCRIPT ( italic_x , italic_z ), with the factors as shown above. This is because all possible combinations are present in R𝑅Ritalic_R in the second and third position after α𝛼\alphaitalic_α. As a result, σx1=αRsubscript𝜎subscript𝑥1𝛼𝑅\sigma_{x_{1}=\alpha}Ritalic_σ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_α end_POSTSUBSCRIPT italic_R is a Cartesian product of unaries, and so is σx1=βRsubscript𝜎subscript𝑥1𝛽𝑅\sigma_{x_{1}=\beta}Ritalic_σ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_β end_POSTSUBSCRIPT italic_R over the same partition (trivially, as a singleton).

The above example illustrates a more general dependency introduced by Delobel, Fagin and Zaniolo around 1977 [28, 7.11], the multivalued dependency.

Definition 4.

Let Σ=MΛ1ΛmΣMsubscriptΛ1subscriptΛ𝑚\Sigma=\textup{M}\cup\Lambda_{1}\cup\dots\cup\Lambda_{m}roman_Σ = M ∪ roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ ⋯ ∪ roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT be a partition. We say that an R𝒟Σ𝑅superscript𝒟ΣR\subseteq\mathcal{D}^{\Sigma}italic_R ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Σ end_POSTSUPERSCRIPT has a multivalued dependency MΛ1ΛmMsubscriptΛ1subscriptΛ𝑚\textup{M}\twoheadrightarrow\Lambda_{1}\cup\dots\cup\Lambda_{m}M ↠ roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ ⋯ ∪ roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT when for all απMR𝛼subscript𝜋M𝑅\alpha\in\pi_{\textup{M}}Ritalic_α ∈ italic_π start_POSTSUBSCRIPT M end_POSTSUBSCRIPT italic_R the selections σxM=αRsubscript𝜎subscript𝑥M𝛼𝑅\sigma_{x_{\textup{M}}=\alpha}Ritalic_σ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT M end_POSTSUBSCRIPT = italic_α end_POSTSUBSCRIPT italic_R are Cartesian products over the common partition Λ1ΛmsubscriptΛ1subscriptΛ𝑚\Lambda_{1}\cup\dots\cup\Lambda_{m}roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ ⋯ ∪ roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. When ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are singletons for all i𝑖iitalic_i then K:=MassignKM\textup{K}:=\textup{M}K := M is called a multikey of R𝑅Ritalic_R, and, more precisely, a 𝐤-multikey𝐤-multikey\boldsymbol{k}\textbf{-multikey}bold_italic_k -multikey when k=|K|𝑘Kk=|\textup{K}|italic_k = | K |.

Note that both functional dependency and Cartesian independence (with M=M\textup{M}=\emptysetM = ∅) are special cases of multivalued dependency. It turns out that such dependency is both necessary and sufficient for join decompositions of the special form (7)-(8), when the same attributes are shared by all factors.

Theorem 6 (Fagin [16]).

Let Σ=MΛ1ΛmΣMsubscriptΛ1subscriptΛ𝑚\Sigma=\textup{M}\cup\Lambda_{1}\cup\dots\cup\Lambda_{m}roman_Σ = M ∪ roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ ⋯ ∪ roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT be a partition. An R𝒟Σ𝑅superscript𝒟ΣR\subseteq\mathcal{D}^{\Sigma}italic_R ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Σ end_POSTSUPERSCRIPT has a join decomposition of the form

R=i=1mπMΛiR=R1Rm,𝑅superscriptsubscriptjoin𝑖1𝑚subscript𝜋MsubscriptΛ𝑖𝑅subscript𝑅1joinjoinsubscript𝑅𝑚R=\operatorname*{\scalebox{1.8}{$\Join$}}\limits_{i=1}^{m}\pi_{\textup{M}\cup% \Lambda_{i}}R=R_{1}\!\Join\dots\Join R_{m},italic_R = ⨝ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT M ∪ roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R = italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⨝ ⋯ ⨝ italic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , (7)

where Ri:=πMΛiRassignsubscript𝑅𝑖subscript𝜋MsubscriptΛ𝑖𝑅R_{i}:=\pi_{\textup{M}\cup\Lambda_{i}}Ritalic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := italic_π start_POSTSUBSCRIPT M ∪ roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R, or, equivalently,

R(x1,,xn)=R1(xM|xΛ1)Rm(xM|xΛm),𝑅subscript𝑥1subscript𝑥𝑛subscript𝑅1conditionalsubscript𝑥Msubscript𝑥subscriptΛ1subscript𝑅𝑚conditionalsubscript𝑥Msubscript𝑥subscriptΛ𝑚R(x_{1},\dots,x_{n})=R_{1}(x_{\textup{M}}\,|\,x_{\Lambda_{1}})\land\dots\land R% _{m}(x_{\textup{M}}\,|\,x_{\Lambda_{m}}),italic_R ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT M end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ∧ ⋯ ∧ italic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT M end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , (8)

if and only if MΛ1ΛmMsubscriptΛ1subscriptΛ𝑚\textup{M}\twoheadrightarrow\Lambda_{1}\cup\dots\cup\Lambda_{m}M ↠ roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ ⋯ ∪ roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is a multivalued dependency in R𝑅Ritalic_R.

Proof.

One direction is trivial, if R𝑅Ritalic_R is of the form (8) then substituting xM=αsubscript𝑥M𝛼x_{\textup{M}}=\alphaitalic_x start_POSTSUBSCRIPT M end_POSTSUBSCRIPT = italic_α turns it into a free conjunction since ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are disjoint. For the other direction, let Λ:=Λ1ΛmassignΛsubscriptΛ1subscriptΛ𝑚\Lambda:=\Lambda_{1}\cup\dots\cup\Lambda_{m}roman_Λ := roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ ⋯ ∪ roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and note that for any απMR𝛼subscript𝜋M𝑅\alpha\in\pi_{\textup{M}}Ritalic_α ∈ italic_π start_POSTSUBSCRIPT M end_POSTSUBSCRIPT italic_R we have

σxM=αR(xΛ)=R1α(xΛ1)Rmα(xΛm),subscript𝜎subscript𝑥M𝛼𝑅subscript𝑥Λsuperscriptsubscript𝑅1𝛼subscript𝑥subscriptΛ1superscriptsubscript𝑅𝑚𝛼subscript𝑥subscriptΛ𝑚\sigma_{x_{\textup{M}}=\alpha}R\,(x_{\Lambda})=R_{1}^{\alpha}(x_{\Lambda_{1}})% \land\dots\land R_{m}^{\alpha}(x_{\Lambda_{m}}),italic_σ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT M end_POSTSUBSCRIPT = italic_α end_POSTSUBSCRIPT italic_R ( italic_x start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT ) = italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ∧ ⋯ ∧ italic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , (9)

with some Riα𝒟Λisuperscriptsubscript𝑅𝑖𝛼superscript𝒟subscriptΛ𝑖R_{i}^{\alpha}\subseteq\mathcal{D}^{\Lambda_{i}}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, by definition of Cartesian product over a partition. It remains to set Ri(a):=RiaM(aΛi)assignsubscript𝑅𝑖𝑎superscriptsubscript𝑅𝑖subscript𝑎Msubscript𝑎subscriptΛ𝑖R_{i}(a):=R_{i}^{a_{\textup{M}}}(a_{\Lambda_{i}})italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a ) := italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT M end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) when aMπMRsubscript𝑎Msubscript𝜋M𝑅a_{\textup{M}}\in\pi_{\textup{M}}Ritalic_a start_POSTSUBSCRIPT M end_POSTSUBSCRIPT ∈ italic_π start_POSTSUBSCRIPT M end_POSTSUBSCRIPT italic_R and 00 (false) otherwise to get (8). ∎

Multivalued dependency reductions are popular in database design due to their constructive nature, but they do not exhaust all possible join reductions. For example, the diversity relation can be join reduced even to binaries, but not in the form (8):

Dn(x1,,xn)=i<jI2(xi,xj)=i<j(xixj).subscript𝐷𝑛subscript𝑥1subscript𝑥𝑛subscript𝑖𝑗subscript𝐼2subscript𝑥𝑖subscript𝑥𝑗subscript𝑖𝑗subscript𝑥𝑖subscript𝑥𝑗D_{n}(x_{1},\dots,x_{n})=\bigwedge\limits_{i<j}I_{2}(x_{i},x_{j})=\bigwedge% \limits_{i<j}(x_{i}\neq x_{j}).italic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ⋀ start_POSTSUBSCRIPT italic_i < italic_j end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = ⋀ start_POSTSUBSCRIPT italic_i < italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) . (10)

However, the more complex a dependency the harder it is to detect and use to produce reductions. The definition of join dependency, for example, amounts to just saying that the relation is a join (but see [30] for a somewhat more cogent characterization).

Be it as it may, we will now show that even taking all possible join reductions into account there are still irreducible relations of arbitrarily high arities. To this end, we observe that the proof of Theorem 2 did not use the fact that ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT form a partition, and so goes through for join reductions without a change. Therefore, its conclusion can be strengthened.

Theorem 7.

Suppose one of the following conditions holds for R𝒟Σ𝑅superscript𝒟ΣR\subseteq\mathcal{D}^{\Sigma}italic_R ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Σ end_POSTSUPERSCRIPT.

(i) R𝑅Ritalic_R is not universal, but for any non-empty proper subset ΛΣΛΣ\emptyset\subset\Lambda\subset\Sigma∅ ⊂ roman_Λ ⊂ roman_Σ

we have πΛR=𝒟Λsubscript𝜋Λ𝑅superscript𝒟Λ\pi_{\Lambda}R=\mathcal{D}^{\Lambda}italic_π start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT italic_R = caligraphic_D start_POSTSUPERSCRIPT roman_Λ end_POSTSUPERSCRIPT.

(ii) ¬R𝑅\neg R¬ italic_R is not empty, but for any iΣ𝑖Σi\in\Sigmaitalic_i ∈ roman_Σ we have πi(¬R)𝒟subscript𝜋𝑖𝑅𝒟\pi_{i}(\neg R)\neq\mathcal{D}italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ¬ italic_R ) ≠ caligraphic_D.

Then R𝑅Ritalic_R is join irreducible.

The reasoning from Example 2 now shows that ¬Insubscript𝐼𝑛\neg I_{n}¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and ¬In𝒜superscriptsubscript𝐼𝑛𝒜\neg I_{n}^{\mathcal{A}}¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_A end_POSTSUPERSCRIPT for non-empty proper 𝒜n𝒜subscript𝑛\mathcal{A}\subset{\mathbb{N}}_{n}caligraphic_A ⊂ blackboard_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are join irreducible for any n1𝑛1n\geq 1italic_n ≥ 1 and |𝒟|2𝒟2|\mathcal{D}|\geq 2| caligraphic_D | ≥ 2. Worse yet, join reductions leave asymptotically ‘almost all’ relations irreducible, albeit not in as strong a sense as Cartesian factorizations.

Theorem 8.

The share of join reducible n𝑛nitalic_n-ary relations among all such relations on a domain 𝒟𝒟\mathcal{D}caligraphic_D is <1absent1<1< 1 for |𝒟|>n𝒟𝑛|\mathcal{D}|>n| caligraphic_D | > italic_n, and asymptotically vanishes when n1𝑛1n\geq 1italic_n ≥ 1 and |𝒟|𝒟|\mathcal{D}|\to\infty| caligraphic_D | → ∞.

Proof.

Suppose R𝑅Ritalic_R is join reducible to a conjunction (4). We can always interpret RΛisuperscript𝑅subscriptΛ𝑖R^{\Lambda_{i}}italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT as having attributes from any superset ΛiΛisubscriptΛ𝑖superscriptsubscriptΛ𝑖\Lambda_{i}^{\prime}\supset\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊃ roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT by conjoining it with the universal relation on the scheme Λi\Λi\superscriptsubscriptΛ𝑖subscriptΛ𝑖\Lambda_{i}^{\prime}\backslash\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT \ roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Since all ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are proper subsets of nsubscript𝑛{\mathbb{N}}_{n}blackboard_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT each of them is contained in one of ΛnΛsubscript𝑛\Lambda\subset{\mathbb{N}}_{n}roman_Λ ⊂ blackboard_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT with |Λ|=n1Λ𝑛1|\Lambda|=n-1| roman_Λ | = italic_n - 1. Therefore, after merging predicates with identical schemes if necessary, we can represent R𝑅Ritalic_R in the form (see [30])

R(x1,,xn)=|Λ|=n1RΛ(xΛ)=i=1nRn\{i}(xn\{i}).𝑅subscript𝑥1subscript𝑥𝑛subscriptΛ𝑛1superscript𝑅Λsubscript𝑥Λsuperscriptsubscript𝑖1𝑛superscript𝑅\subscript𝑛𝑖subscript𝑥\subscript𝑛𝑖R(x_{1},\dots,x_{n})=\bigwedge\limits_{|\Lambda|=n-1}\!\!\!\!\!R^{\Lambda}(x_{% \Lambda})=\bigwedge\limits_{i=1}^{n}R^{{\mathbb{N}}_{n}\backslash\{i\}}\big{(}% x_{{\mathbb{N}}_{n}\backslash\{i\}}\big{)}.italic_R ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ⋀ start_POSTSUBSCRIPT | roman_Λ | = italic_n - 1 end_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT roman_Λ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT ) = ⋀ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT blackboard_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT \ { italic_i } end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT blackboard_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT \ { italic_i } end_POSTSUBSCRIPT ) .

There are n𝑛nitalic_n such ΛΛ\Lambdaroman_Λ and |𝒫(𝒟Λ)|=2|𝒟||Λ|=2|𝒟|n1𝒫superscript𝒟Λsuperscript2superscript𝒟Λsuperscript2superscript𝒟𝑛1|\mathcal{P}(\mathcal{D}^{\Lambda})|=2^{|\mathcal{D}|^{|\Lambda|}}=2^{|% \mathcal{D}|^{n-1}}| caligraphic_P ( caligraphic_D start_POSTSUPERSCRIPT roman_Λ end_POSTSUPERSCRIPT ) | = 2 start_POSTSUPERSCRIPT | caligraphic_D | start_POSTSUPERSCRIPT | roman_Λ | end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT = 2 start_POSTSUPERSCRIPT | caligraphic_D | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT​​ choices for each RΛsuperscript𝑅ΛR^{\Lambda}italic_R start_POSTSUPERSCRIPT roman_Λ end_POSTSUPERSCRIPT. Some of them may be incompatible, so we get an upper bound on the number of join reducible relations: Njred2n|𝒟|n1subscript𝑁𝑗𝑟𝑒𝑑superscript2𝑛superscript𝒟𝑛1N_{jred}\leq 2^{n|\mathcal{D}|^{n-1}}italic_N start_POSTSUBSCRIPT italic_j italic_r italic_e italic_d end_POSTSUBSCRIPT ≤ 2 start_POSTSUPERSCRIPT italic_n | caligraphic_D | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. The conclusions now follow by applying calculus to the fraction

2n|𝒟|n12|𝒟|n=2(|𝒟|n)|𝒟|n1.superscript2𝑛superscript𝒟𝑛1superscript2superscript𝒟𝑛superscript2𝒟𝑛superscript𝒟𝑛1\frac{2^{n|\mathcal{D}|^{n-1}}}{2^{|\mathcal{D}|^{n}}}=2^{-(|\mathcal{D}|-n)|% \mathcal{D}|^{n-1}}.divide start_ARG 2 start_POSTSUPERSCRIPT italic_n | caligraphic_D | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG 2 start_POSTSUPERSCRIPT | caligraphic_D | start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG = 2 start_POSTSUPERSCRIPT - ( | caligraphic_D | - italic_n ) | caligraphic_D | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT . (11)

Note that if |𝒟|𝒟|\mathcal{D}|| caligraphic_D | is fixed and n𝑛n\to\inftyitalic_n → ∞ the ratio in (11) goes to \infty, not to 00, so we cannot conclude that increasing arity on a fixed domain also leads to a vanishing share of join reducible relations, as we could for degenerate ones in Theorem 3.

4 Projective joins and hypostatic abstraction

As noted by Rissanen [34], joins are the most general form of decomposition where the constituents can be recovered by taking projections. Yet they still admit irreducibles of arbitrarily high arity, and many of them. If we want a more manageable collection of irreducibles we need to relax the constructivism. There is also a more practical reason. When relations are entered into databases the expected data dependencies are often prescribed by design. What if potential entrants do not have them? They are then “pre-treated” before incorporation to mend that. For example, full names are expected to function as relation keys, but identical namesakes do occur sometimes. This is mended by attaching ID columns to the tables that restore uniqueness (and hence, functional dependency).

Such pre-treatment is often done “under the desk”, but it is of interest to develop its devices systematically. Attaching a key appears already in C.S. Peirce’s works on relations from 1890s under the name of “hypostatic abstraction” [8, 9]. Consider the ternary relation G(x,y,z):=assign𝐺𝑥𝑦𝑧absentG(x,y,z):=italic_G ( italic_x , italic_y , italic_z ) :=x𝑥xitalic_x gives y𝑦yitalic_y to z𝑧zitalic_z”. To reduce it to binaries, Peirce introduces (“hypostatizes”) new abstracted objects t𝑡titalic_t, the acts of giving, and treats x𝑥xitalic_x, y𝑦yitalic_y and z𝑧zitalic_z as their attributes, the giver, the gift, and the recipient. If we denote their relations to the object by G(t,x)superscript𝐺𝑡𝑥G^{\prime}(t,x)italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t , italic_x ), G′′(t,y)superscript𝐺′′𝑡𝑦G^{\prime\prime}(t,y)italic_G start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_t , italic_y ) and G′′′(t,z)superscript𝐺′′′𝑡𝑧G^{\prime\prime\prime}(t,z)italic_G start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ( italic_t , italic_z ), respectively, then we can form a key-form join G(t,x)G′′(t,y)G′′′(t,z)superscript𝐺𝑡𝑥superscript𝐺′′𝑡𝑦superscript𝐺′′′𝑡𝑧G^{\prime}(t,x)\land G^{\prime\prime}(t,y)\land G^{\prime\prime\prime}(t,z)italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t , italic_x ) ∧ italic_G start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_t , italic_y ) ∧ italic_G start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ( italic_t , italic_z ) that carries all the information of G(x,y,z)𝐺𝑥𝑦𝑧G(x,y,z)italic_G ( italic_x , italic_y , italic_z ). Except it also features the acts of giving t𝑡titalic_t that are not among the original attributes. What we really need to say is that there is such an act for given x𝑥xitalic_x, y𝑦yitalic_y and z𝑧zitalic_z:

G(x,y,z)=t[G(t,x)G′′(t,y)G′′′(t,z)].𝐺𝑥𝑦𝑧𝑡delimited-[]superscript𝐺𝑡𝑥superscript𝐺′′𝑡𝑦superscript𝐺′′′𝑡𝑧G(x,y,z)=\exists t\left[G^{\prime}(t,x)\land G^{\prime\prime}(t,y)\land G^{% \prime\prime\prime}(t,z)\right].italic_G ( italic_x , italic_y , italic_z ) = ∃ italic_t [ italic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t , italic_x ) ∧ italic_G start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_t , italic_y ) ∧ italic_G start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ( italic_t , italic_z ) ] . (12)

This is a reduction of sorts, but it is no longer a join, rather a projection of a join. Its factors cannot be recovered from G𝐺Gitalic_G even if we know their relation schemes because of the freedom in t𝑡titalic_t assignments. But it does fit well with the algebraic ideology of joins: first, we represent a given R𝑅Ritalic_R as a projection of a higher arity relation R^^𝑅\widehat{R}over^ start_ARG italic_R end_ARG, and then factor R^^𝑅\widehat{R}over^ start_ARG italic_R end_ARG into a join. While the factors are not projections of R𝑅Ritalic_R, they are projections of R^^𝑅\widehat{R}over^ start_ARG italic_R end_ARG, as is R𝑅Ritalic_R itself.

Definition 5.

The projective join, or just projoin, of attributed relations Ri𝒟Λisubscript𝑅𝑖superscript𝒟subscriptΛ𝑖R_{i}\subseteq\mathcal{D}^{\Lambda_{i}}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT with the set of projected attributes ΓiΛiΓsubscript𝑖subscriptΛ𝑖\Gamma\subseteq\cup_{i}\Lambda_{i}roman_Γ ⊆ ∪ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is πΓ[R1Rm]subscript𝜋Γdelimited-[]joinsubscript𝑅1joinsubscript𝑅𝑚\pi_{\Gamma}\left[R_{1}\Join\dots\Join R_{m}\right]italic_π start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT [ italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⨝ ⋯ ⨝ italic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ], i.e. the projection of their (natural) join to ΓΓ\Gammaroman_Γ. A relation R𝒟Σ𝑅superscript𝒟ΣR\subseteq\mathcal{D}^{\Sigma}italic_R ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Σ end_POSTSUPERSCRIPT is a projoin over a cover TΣ=Λ1ΛmTΣsubscriptΛ1subscriptΛ𝑚\textup{T}\cup\Sigma=\Lambda_{1}\cup\dots\cup\Lambda_{m}T ∪ roman_Σ = roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ ⋯ ∪ roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT with TΣ=TΣ\textup{T}\cap\Sigma=\emptysetT ∩ roman_Σ = ∅ when there exist RΛi𝒟Λisuperscript𝑅subscriptΛ𝑖superscript𝒟subscriptΛ𝑖R^{\Lambda_{i}}\subseteq\mathcal{D}^{\Lambda_{i}}italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, called its projoin factors, such that

R=πΣ[RΛ1RΛm]=πΣR^.𝑅subscript𝜋Σdelimited-[]joinsuperscript𝑅subscriptΛ1joinsuperscript𝑅subscriptΛ𝑚subscript𝜋Σ^𝑅R=\pi_{\Sigma}\!\left[R^{\Lambda_{1}}\Join\dots\Join R^{\Lambda_{m}}\right]=% \pi_{\Sigma}\widehat{R}.italic_R = italic_π start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT [ italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⨝ ⋯ ⨝ italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] = italic_π start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT over^ start_ARG italic_R end_ARG .

The join R^^𝑅\widehat{R}over^ start_ARG italic_R end_ARG is called the augmented relation, and elements of T (projoin) parameters. R𝑅Ritalic_R is called projoin reducible when it is a projoin with 0<|Λi|<|Σ|0subscriptΛ𝑖Σ0<|\Lambda_{i}|<|\Sigma|0 < | roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | < | roman_Σ |.

The corresponding predicate formula for projoins is:

R(x1,,xn)=xT[RΛ1(xΛ1)RΛm(xΛm)].𝑅subscript𝑥1subscript𝑥𝑛subscript𝑥Tdelimited-[]superscript𝑅subscriptΛ1subscript𝑥subscriptΛ1superscript𝑅subscriptΛ𝑚subscript𝑥subscriptΛ𝑚R(x_{1},\dots,x_{n})=\exists x_{\textup{T}}\left[R^{\Lambda_{1}}(x_{\Lambda_{1% }})\land\dots\land R^{\Lambda_{m}}(x_{\Lambda_{m}})\right].italic_R ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ∃ italic_x start_POSTSUBSCRIPT T end_POSTSUBSCRIPT [ italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ∧ ⋯ ∧ italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] . (13)

It is more typical to consider reductions in the algebra with two separate operations, join and projection, and project-join expressions are studied in [15, 38] and are part of a link between the relational model and Tarski’s cylindric algebras [10, 18]. In terms of predicates, passing from joins to projoins means that in reductions we allow existential quantification on top of conjunction and identification of variables, i.e. consider primitive positive expressions (assuming the binary identity is among the available predicates). Up to switching from attributed to ordinary relations, the projoin reduction is essentially equivalent to reduction in the relational (co-clone) algebra of clone theory [14, 23, 33]. As such, it plays a key role in characterizing complexity of constraint satisfaction problems (CSP), and the study of generating sets of small arity in connection with them can be applied to solving Peircian reduction problems on finite domains [5].

Just as join dependencies can be expressed using joins, generalized data dependencies can be expressed using project-join equations [38]. With projoins, one can use the flexibility of augmenting the relation to make the decomposition methods for joins more broadly applicable. The simplest method used a key, but whether a relation has a key is, in some ways, a bookkeeping matter. This is another reason for moving from joins to projoins.

Definition 6.

We say that a relation R𝒟n𝑅superscript𝒟𝑛R\subseteq\mathcal{D}^{n}italic_R ⊆ caligraphic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT admits a key (multikey) when it is a projection of a relation R^𝒟TΣ^𝑅superscript𝒟TΣ\widehat{R}\subseteq\mathcal{D}^{\textup{T}\cup\Sigma}over^ start_ARG italic_R end_ARG ⊆ caligraphic_D start_POSTSUPERSCRIPT T ∪ roman_Σ end_POSTSUPERSCRIPT with a key (multikey), i.e. R=πΣR^𝑅subscript𝜋Σ^𝑅R=\pi_{\Sigma}\widehat{R}italic_R = italic_π start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT over^ start_ARG italic_R end_ARG.

Admitting a key is less of a bookkeeping matter than already having it. Note that T need not be the key in itself, the key may combine T with some attributes from the original relation. However, whether R𝑅Ritalic_R admits a key or not does not depend on which attributes are used in it.

Lemma 1.

A relation R𝒟Σ𝑅superscript𝒟ΣR\subseteq\mathcal{D}^{\Sigma}italic_R ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Σ end_POSTSUPERSCRIPT admits a k𝑘kitalic_k-key if and only if |R||𝒟|k𝑅superscript𝒟𝑘|R|\leq|\mathcal{D}|^{k}| italic_R | ≤ | caligraphic_D | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT.

Proof.

Suppose R=πTR^𝑅subscript𝜋T^𝑅R=\pi_{\textup{T}}\widehat{R}italic_R = italic_π start_POSTSUBSCRIPT T end_POSTSUBSCRIPT over^ start_ARG italic_R end_ARG and KTΣKTΣ\textup{K}\subseteq\textup{T}\cup\SigmaK ⊆ T ∪ roman_Σ is a key of R^^𝑅\widehat{R}over^ start_ARG italic_R end_ARG with |K|=kK𝑘|\textup{K}|=k| K | = italic_k. Since there are at most |𝒟|ksuperscript𝒟𝑘|\mathcal{D}|^{k}| caligraphic_D | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT distinct tuples in the K positions of R^^𝑅\widehat{R}over^ start_ARG italic_R end_ARG, and they determine the remaining values, |R^||𝒟|k^𝑅superscript𝒟𝑘|\widehat{R}|\leq|\mathcal{D}|^{k}| over^ start_ARG italic_R end_ARG | ≤ | caligraphic_D | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT. But projection never increases the cardinality of a relation, so |R|=|πΣR^||R^||𝒟|k𝑅subscript𝜋Σ^𝑅^𝑅superscript𝒟𝑘|R|=|\pi_{\Sigma}\widehat{R}|\leq|\widehat{R}|\leq|\mathcal{D}|^{k}| italic_R | = | italic_π start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT over^ start_ARG italic_R end_ARG | ≤ | over^ start_ARG italic_R end_ARG | ≤ | caligraphic_D | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT.

Conversely, suppose |R||𝒟|k𝑅superscript𝒟𝑘|R|\leq|\mathcal{D}|^{k}| italic_R | ≤ | caligraphic_D | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT. Then we can pick any T of cardinality k𝑘kitalic_k with unused attributes, and assign to every tuple in aR𝑎𝑅a\in Ritalic_a ∈ italic_R a unique label t(a)𝒟T𝑡𝑎superscript𝒟Tt(a)\in\mathcal{D}^{\textup{T}}italic_t ( italic_a ) ∈ caligraphic_D start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT. We define R^^𝑅\widehat{R}over^ start_ARG italic_R end_ARG as the set of augmented tuples R^:={(t(a),a)|aR}assign^𝑅conditional-set𝑡𝑎𝑎𝑎𝑅\widehat{R}:=\{(t(a),a)\,|\,a\in R\}over^ start_ARG italic_R end_ARG := { ( italic_t ( italic_a ) , italic_a ) | italic_a ∈ italic_R }. By construction, the first k𝑘kitalic_k positions of R^^𝑅\widehat{R}over^ start_ARG italic_R end_ARG are its key and πΣR^=Rsubscript𝜋Σ^𝑅𝑅\pi_{\Sigma}\widehat{R}=Ritalic_π start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT over^ start_ARG italic_R end_ARG = italic_R. ∎

Recall from the previous section that relations with a 1111-key are quite rare on finite domains. A striking consequence of this lemma is that on infinite domains, any relation admits a 1111-key. Indeed, by the cardinal arithmetic, |D|n=|𝒟|superscript𝐷𝑛𝒟|D|^{n}=|\mathcal{D}|| italic_D | start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = | caligraphic_D | for any finite n𝑛nitalic_n, so |R||𝒟|n=|𝒟|𝑅superscript𝒟𝑛𝒟|R|\leq|\mathcal{D}|^{n}=|\mathcal{D}|| italic_R | ≤ | caligraphic_D | start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = | caligraphic_D | for any n𝑛nitalic_n-ary relation on 𝒟𝒟\mathcal{D}caligraphic_D. The next theorem generalizes Peirce’s trick (12) for the relation of giving and connects it to relation keys, see also [8] for the general case.

Theorem 9 (Hypostatic abstraction).

Any R𝒟Σ𝑅superscript𝒟ΣR\subseteq\mathcal{D}^{\Sigma}italic_R ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Σ end_POSTSUPERSCRIPT with |R||D|k𝑅superscript𝐷𝑘|R|\leq|D|^{k}| italic_R | ≤ | italic_D | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT decomposes into a projoin of (k+1)𝑘1(k+1)( italic_k + 1 )-aries as

R=πΣ[i=1nπT{i}R^]=πΣ[R1Rn],𝑅subscript𝜋Σdelimited-[]superscriptsubscriptjoin𝑖1𝑛subscript𝜋T𝑖^𝑅subscript𝜋Σdelimited-[]joinsubscript𝑅1joinsubscript𝑅𝑛R=\pi_{\Sigma}\!\left[\operatorname*{\scalebox{1.8}{$\Join$}}\limits_{i=1}^{n}% \pi_{\textup{T}\cup\{i\}}\widehat{R}\right]=\pi_{\Sigma}\!\left[R_{1}\!\Join% \dots\Join R_{n}\right],italic_R = italic_π start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT [ ⨝ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT T ∪ { italic_i } end_POSTSUBSCRIPT over^ start_ARG italic_R end_ARG ] = italic_π start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT [ italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⨝ ⋯ ⨝ italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] , (14)

where Ri:=πT{i}R^assignsubscript𝑅𝑖subscript𝜋T𝑖^𝑅R_{i}:=\pi_{\textup{T}\cup\{i\}}\widehat{R}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := italic_π start_POSTSUBSCRIPT T ∪ { italic_i } end_POSTSUBSCRIPT over^ start_ARG italic_R end_ARG for some R^𝒟TΣ^𝑅superscript𝒟TΣ\widehat{R}\subseteq\mathcal{D}^{\textup{T}\cup\Sigma}over^ start_ARG italic_R end_ARG ⊆ caligraphic_D start_POSTSUPERSCRIPT T ∪ roman_Σ end_POSTSUPERSCRIPT with |T|=kT𝑘|\textup{T}|=k| T | = italic_k. In the predicate form,

R(x1,,xn)=t1tk[R1(t1,,tk,x1)Rn(t1,,tk,xn)].𝑅subscript𝑥1subscript𝑥𝑛subscript𝑡1subscript𝑡𝑘delimited-[]subscript𝑅1subscript𝑡1subscript𝑡𝑘subscript𝑥1subscript𝑅𝑛subscript𝑡1subscript𝑡𝑘subscript𝑥𝑛R(x_{1},\dots,x_{n})=\exists t_{1}\dots\exists t_{k}\left[R_{1}(t_{1},\dots,t_% {k},x_{1})\land\dots\land R_{n}(t_{1},\dots,t_{k},x_{n})\right].italic_R ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ∃ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … ∃ italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∧ ⋯ ∧ italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] . (15)

In particular, R𝑅Ritalic_R is projoin reducible when kn2𝑘𝑛2k\leq n-2italic_k ≤ italic_n - 2, and projoin reducible to binaries when k=1𝑘1k=1italic_k = 1.

Proof.

The augmented relation R^𝒟TΣ^𝑅superscript𝒟TΣ\widehat{R}\subseteq\mathcal{D}^{\textup{T}\cup\Sigma}over^ start_ARG italic_R end_ARG ⊆ caligraphic_D start_POSTSUPERSCRIPT T ∪ roman_Σ end_POSTSUPERSCRIPT is constructed in the proof of Lemma 1. Since the augmented attributes T are its k𝑘kitalic_k-key, by construction, the decomposition formula (14) follows from Theorem 5. In (15) we specialized to T={t1,,tk}Tsubscript𝑡1subscript𝑡𝑘\textup{T}=\{t_{1},\dots,t_{k}\}T = { italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } for concreteness, and ordered TΣTΣ\textup{T}\cup\SigmaT ∪ roman_Σ so that the T positions go before the original ones. ∎

Hypostatic abstraction radically simplifies the picture of reducibility on infinite domains, and delivers a “small” set of irreducibles that eluded us with Cartesian products and joins. As we mentioned, this is similar to factorization of polynomials over {\mathbb{R}}blackboard_R, where the only irreducible ones are linear and quadratic.

Corollary 1.

Any n𝑛nitalic_n-ary with n3𝑛3n\geq 3italic_n ≥ 3 on an infinite domain reduces to a projoin of n𝑛nitalic_n binaries. The only projoin irreducible relations on such domains are all unaries and non-degenerate binaries.

Proof.

The first claim is a direct consequence of cardinal arithmetic for infinite |𝒟|𝒟|\mathcal{D}|| caligraphic_D | and Theorem 9. Unaries have nothing to be reduced to. Binaries can only be reduced to unaries. But quantifying over a conjunction of unary predicates still produces a conjunction of unary predicates. Since unaries have only one variable each it is a free conjunction, i.e. an unaric Cartesian product. Thus, projoin reducible binaries must be degenerate. ∎

Löwenheim might have been the first to state and prove this result in a more formal manner in 1915 [27]. His proof used somewhat more cumbersome iterated pairing construction instead of hypostatic abstraction. In fact, the result is even stronger than stated. Recall that projoins are ranked by the number of parameters, with joins having none, 1111-key hypostatic reductions like (12) having one, and so on. It follows from the proof that all higher arity relations are not only projoin reducible on infinite domains, but even projoin reducible with a single parameter. This is not the case on finite domains, as we will now demonstrate.

Example 6.

Consider ¬Insubscript𝐼𝑛\neg I_{n}¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT on a finite domain 𝒟𝒟\mathcal{D}caligraphic_D. If it is projoin reducible then factors without augmented attributes can be dropped. Indeed, in the predicate representation (13) those factors can be taken out of the scope of quantifiers, and each carries a proper subset of ¬Insubscript𝐼𝑛\neg I_{n}¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT’s variables. But ¬Insubscript𝐼𝑛\neg I_{n}¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is universal on any proper subset of its positions, and hence so must be those factors since they enter conjunctively. Lower arity factors can be absorbed into those of arity n1𝑛1n-1italic_n - 1, so for n=3𝑛3n=3italic_n = 3 any projoin reduction with one parameter condenses to just this

¬I3(x1,x2,x3)=t[R1(t,x1)R2(t,x2)R3(t,x3)].subscript𝐼3subscript𝑥1subscript𝑥2subscript𝑥3𝑡delimited-[]subscript𝑅1𝑡subscript𝑥1subscript𝑅2𝑡subscript𝑥2subscript𝑅3𝑡subscript𝑥3\neg I_{3}(x_{1},x_{2},x_{3})=\exists t\left[R_{1}(t,x_{1})\land R_{2}(t,x_{2}% )\land R_{3}(t,x_{3})\right].¬ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = ∃ italic_t [ italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∧ italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_R start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_t , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ] . (16)

We will now restrict to a two-element domain 𝒟:={α,β}assign𝒟𝛼𝛽\mathcal{D}:=\{\alpha,\beta\}caligraphic_D := { italic_α , italic_β }. Then t𝑡titalic_t can only take two values, and, recalling the interpretation of the existential quantifier, (16) represents ¬I3subscript𝐼3\neg I_{3}¬ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT as a disjunction of two unary conjunctions, i.e. a union of two unary Cartesian products. Possible cardinalities of such products on a two-element domain are 1,2,41241,2,41 , 2 , 4 and 8888. But 8888 is impossible because |¬I3|=6<8subscript𝐼368|\neg I_{3}|=6<8| ¬ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT | = 6 < 8, and 4444 comes from a product of two doublets and a singleton. But a product of two doublets will include both (α,α)𝛼𝛼(\alpha,\alpha)( italic_α , italic_α ) and (β,β)𝛽𝛽(\beta,\beta)( italic_β , italic_β ), so multiplying it by any singleton will produce a constant tuple not in ¬I3subscript𝐼3\neg I_{3}¬ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. So 4444 is also impossible. As for 1111 and 2222, no pair of them can cover all 6666 tuples of ¬I3subscript𝐼3\neg I_{3}¬ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT.

Thus, on two-element domains ¬I3subscript𝐼3\neg I_{3}¬ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is not projoin reducible with only one parameter. A similar counting argument works also for |𝒟|=3𝒟3|\mathcal{D}|=3| caligraphic_D | = 3. For larger |𝒟|𝒟|\mathcal{D}|| caligraphic_D | combinatorial considerations of this sort quickly become unmanageable. For n>3𝑛3n>3italic_n > 3 even single parameter projoin reductions are no longer of the simple form (16), as free variables can be shared and ¬Insubscript𝐼𝑛\neg I_{n}¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT need not be a union of Cartesian products, see Example 8.

A counting argument similar to that of Theorem 8 further shows that the reducibility behavior on finite domains is quite different.

Theorem 10.

The share of n𝑛nitalic_n-ary, n3𝑛3n\geq 3italic_n ≥ 3, relations projoin reducible with k𝑘kitalic_k parameters among all such relations on a domain 𝒟𝒟\mathcal{D}caligraphic_D is <1absent1<1< 1 for |𝒟|>(n+kn1)𝒟binomial𝑛𝑘𝑛1|\mathcal{D}|>\binom{n+k}{n-1}| caligraphic_D | > ( FRACOP start_ARG italic_n + italic_k end_ARG start_ARG italic_n - 1 end_ARG ), and asymptotically vanishes when |𝒟|𝒟|\mathcal{D}|\to\infty| caligraphic_D | → ∞.

Proof.

As in the proof of Theorem 8, we transform the projoin into a form with factors of maximal possible arity in a reduction, n1𝑛1n-1italic_n - 1. Only this time, due to the augmented positions, their total number is (n+kn1)binomial𝑛𝑘𝑛1\binom{n+k}{n-1}( FRACOP start_ARG italic_n + italic_k end_ARG start_ARG italic_n - 1 end_ARG ) rather than n𝑛nitalic_n (to which it reduces for k=0𝑘0k=0italic_k = 0). The rest of the proof is analogous, and leads to 2(|𝒟|(n+kn1))|𝒟|n1superscript2𝒟binomial𝑛𝑘𝑛1superscript𝒟𝑛12^{-\left(|\mathcal{D}|-\binom{n+k}{n-1}\right)\,|\mathcal{D}|^{n-1}}2 start_POSTSUPERSCRIPT - ( | caligraphic_D | - ( FRACOP start_ARG italic_n + italic_k end_ARG start_ARG italic_n - 1 end_ARG ) ) | caligraphic_D | start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT​​ as the upper bound for the share. ∎

What are we to make of such a striking discrepancy between reducibilities on finite and infinite domains? The universal applicability of hypostatic abstraction on infinite domains derives directly from non-constructive and ‘unnatural’ bijections between 𝒟𝒟\mathcal{D}caligraphic_D and its Cartesian powers. By a theorem of Tarski [19, 11.3], |𝒟|2=|𝒟|superscript𝒟2𝒟|\mathcal{D}|^{2}=|\mathcal{D}|| caligraphic_D | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = | caligraphic_D | for all infinite 𝒟𝒟\mathcal{D}caligraphic_D is equivalent to the axiom of choice. Perhaps, it is of interest to consider reduction in models of set theory where availability of such bijections is restricted and the |𝒟|,|𝒟|2,𝒟superscript𝒟2|\mathcal{D}|,|\mathcal{D}|^{2},\dots| caligraphic_D | , | caligraphic_D | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , … hierarchy does not collapse, such as ZF models with Dedekind finite sets, or to restrict maps allowed in definitions of reducing relations directly. These may model relations on large finite domains better than do all relations on infinite domains of ZFC.

5 Projections as unions

As the results of the previous section show, reductions more general than (14)-(15) are still of interest on finite domains. Indeed, they may be of interest even on infinite domains as analyses of relations alternative to hypostatic abstraction. In this section we build on the relationship between existential quantification (projection) and unions exploited in Example 6 to construct such reductions. It is more technical than the rest of the paper and can be skipped without loss of continuity. We start by giving a projective version of Fagin’s theorem.

Theorem 11.

An R𝒟Σ𝑅superscript𝒟ΣR\subseteq\mathcal{D}^{\Sigma}italic_R ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Σ end_POSTSUPERSCRIPT has a projoin decomposition of the form:

R=πΣ[i=1mRi]=πΣ[R1Rm],𝑅subscript𝜋Σdelimited-[]superscriptsubscriptjoin𝑖1𝑚subscript𝑅𝑖subscript𝜋Σdelimited-[]joinsubscript𝑅1joinsubscript𝑅𝑚R=\pi_{\Sigma}\!\left[\operatorname*{\scalebox{1.8}{$\Join$}}\limits_{i=1}^{m}% R_{i}\right]=\pi_{\Sigma}\!\left[R_{1}\!\Join\dots\Join R_{m}\right],italic_R = italic_π start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT [ ⨝ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = italic_π start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT [ italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⨝ ⋯ ⨝ italic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] , (17)

where Ri𝒟TΛisubscript𝑅𝑖superscript𝒟TsubscriptΛ𝑖R_{i}\subseteq\mathcal{D}^{\textup{T}\cup\Lambda_{i}}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊆ caligraphic_D start_POSTSUPERSCRIPT T ∪ roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT for some partition Σ=Λ1ΛmΣsubscriptΛ1subscriptΛ𝑚\Sigma=\Lambda_{1}\cup\dots\cup\Lambda_{m}roman_Σ = roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ ⋯ ∪ roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, or, in the predicate form,

R(x1,,xn)=t1tk[R1(t1,,tk,xΛ1)Rn(t1,,tk,xΛm)],𝑅subscript𝑥1subscript𝑥𝑛subscript𝑡1subscript𝑡𝑘delimited-[]subscript𝑅1subscript𝑡1subscript𝑡𝑘subscript𝑥subscriptΛ1subscript𝑅𝑛subscript𝑡1subscript𝑡𝑘subscript𝑥subscriptΛ𝑚R(x_{1},\dots,x_{n})=\exists t_{1}\dots\exists t_{k}\left[R_{1}(t_{1},\dots,t_% {k},x_{\Lambda_{1}})\land\dots\land R_{n}(t_{1},\dots,t_{k},x_{\Lambda_{m}})% \right],italic_R ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ∃ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … ∃ italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ∧ ⋯ ∧ italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] , (18)

if and only if it is a union of Cartesian products over the common partition ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with no more than |𝒟||T|superscript𝒟T|\mathcal{D}|^{|\textup{T}|}| caligraphic_D | start_POSTSUPERSCRIPT | T | end_POSTSUPERSCRIPT terms.

Proof.

Suppose R=πΣR^𝑅subscript𝜋Σ^𝑅R=\pi_{\Sigma}\widehat{R}italic_R = italic_π start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT over^ start_ARG italic_R end_ARG with R^=i=1mRi^𝑅superscriptsubscriptjoin𝑖1𝑚subscript𝑅𝑖\widehat{R}=\operatorname*{\scalebox{1.8}{$\Join$}}_{i=1}^{m}R_{i}over^ start_ARG italic_R end_ARG = ⨝ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Then, by Fagin’s theorem, there is a multivalued dependency TΛ1ΛmTsubscriptΛ1subscriptΛ𝑚\textup{T}\twoheadrightarrow\Lambda_{1}\cup\dots\cup\Lambda_{m}T ↠ roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ ⋯ ∪ roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT in R^^𝑅\widehat{R}over^ start_ARG italic_R end_ARG. But then, by definition, for all aπTR^𝑎subscript𝜋𝑇^𝑅a\in\pi_{T}\widehat{R}italic_a ∈ italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT over^ start_ARG italic_R end_ARG the selections σxT=aRsubscript𝜎subscript𝑥T𝑎𝑅\sigma_{x_{\textup{T}}=a}Ritalic_σ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT T end_POSTSUBSCRIPT = italic_a end_POSTSUBSCRIPT italic_R are Cartesian products over ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and

R=aπTR^σxT=aR.𝑅subscript𝑎subscript𝜋𝑇^𝑅subscript𝜎subscript𝑥T𝑎𝑅R=\bigcup_{a\in\pi_{T}\widehat{R}}\sigma_{x_{\textup{T}}=a}R\,.italic_R = ⋃ start_POSTSUBSCRIPT italic_a ∈ italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT over^ start_ARG italic_R end_ARG end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT T end_POSTSUBSCRIPT = italic_a end_POSTSUBSCRIPT italic_R .

Since πTR^𝒟Tsubscript𝜋𝑇^𝑅superscript𝒟T\pi_{T}\widehat{R}\subseteq\mathcal{D}^{\textup{T}}italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT over^ start_ARG italic_R end_ARG ⊆ caligraphic_D start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT the number of terms in the union is no more than |𝒟T|=|𝒟||T|superscript𝒟Tsuperscript𝒟T|\mathcal{D}^{\textup{T}}|=|\mathcal{D}|^{|\textup{T}|}| caligraphic_D start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT | = | caligraphic_D | start_POSTSUPERSCRIPT | T | end_POSTSUPERSCRIPT.

Conversely, suppose R𝑅Ritalic_R is a union of Cartesian products over ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with no more than |𝒟||T|superscript𝒟T|\mathcal{D}|^{|\textup{T}|}| caligraphic_D | start_POSTSUPERSCRIPT | T | end_POSTSUPERSCRIPT terms. Select a distinct a𝒟T𝑎superscript𝒟Ta\in\mathcal{D}^{\textup{T}}italic_a ∈ caligraphic_D start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT for each term and label its term Rasuperscript𝑅𝑎R^{a}italic_R start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT. Then define

R^:=a{(a,x)|xRa}.assign^𝑅subscript𝑎conditional-set𝑎𝑥𝑥superscript𝑅𝑎\widehat{R}:=\bigcup_{a}\,\,\{(a,x)\,|\,x\in R^{a}\}.over^ start_ARG italic_R end_ARG := ⋃ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT { ( italic_a , italic_x ) | italic_x ∈ italic_R start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT } .

By construction, R=aRa𝑅subscript𝑎superscript𝑅𝑎R=\bigcup_{a}R^{a}italic_R = ⋃ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT, and since Rasuperscript𝑅𝑎R^{a}italic_R start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT are Cartesian products over ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT we have TΛ1ΛmTsubscriptΛ1subscriptΛ𝑚\textup{T}\twoheadrightarrow\Lambda_{1}\cup\dots\cup\Lambda_{m}T ↠ roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ ⋯ ∪ roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. Now Fagin’s theorem gives the desired decomposition of R^^𝑅\widehat{R}over^ start_ARG italic_R end_ARG. ∎

Theorem 11 underscores a close relationship between projection and union, in predicate terms, between existential quantification and disjunction. One can represent existentially quantified formula as a disjunction by moving quantified variables into an index, as in tR(t,x)=tRt(x)𝑡𝑅𝑡𝑥subscript𝑡superscript𝑅𝑡𝑥\exists tR(t,x)=\bigvee_{t}R^{t}(x)∃ italic_t italic_R ( italic_t , italic_x ) = ⋁ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_x ). For finite relations, existential quantification can be converted into disjunctions completely. However, there are two limitations on such disjunctions. First, the number of disjuncts is limited by |𝒟|ksuperscript𝒟𝑘|\mathcal{D}|^{k}| caligraphic_D | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, where k𝑘kitalic_k is the number of bound variables used, and second, the arity of disjuncts is then increased by k𝑘kitalic_k. As a result, if kn𝑘𝑛k\geq nitalic_k ≥ italic_n is needed to get the number of disjuncts under |𝒟|ksuperscript𝒟𝑘|\mathcal{D}|^{k}| caligraphic_D | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT then the disjunction does not convert into a projoin reduction.

On the other hand, allowing disjunctions/unions without restrictions completely trivializes the reduction problem. Any relation is a union of its tuples, and each tuple is the unary Cartesian product of singletons containing its members, i.e. R=aR{a1}××{an}𝑅subscript𝑎𝑅subscript𝑎1subscript𝑎𝑛R=\bigcup_{a\in R}\,\{a_{1}\}\times\dots\times\{a_{n}\}italic_R = ⋃ start_POSTSUBSCRIPT italic_a ∈ italic_R end_POSTSUBSCRIPT { italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } × ⋯ × { italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }. Even if only finite unions are allowed every relation on a finite domain will ‘reduce’ to unary Cartesian products. However, in principle, it may be of interest to explore unions with restrictions other than those imposed by the existential quantifier, e.g. with bounds on the number of terms independent of the size of the domain.

Example 6 may suggest that non-identities ¬Insubscript𝐼𝑛\neg I_{n}¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are projoin irreducible on finite domains. Indeed, |¬In|=|𝒟|n|𝒟|>|𝒟|n2subscript𝐼𝑛superscript𝒟𝑛𝒟superscript𝒟𝑛2|\neg I_{n}|=|\mathcal{D}|^{n}-|\mathcal{D}|>|\mathcal{D}|^{n-2}| ¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | = | caligraphic_D | start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - | caligraphic_D | > | caligraphic_D | start_POSTSUPERSCRIPT italic_n - 2 end_POSTSUPERSCRIPT for n3𝑛3n\geq 3italic_n ≥ 3 and |𝒟|2𝒟2|\mathcal{D}|\geq 2| caligraphic_D | ≥ 2, so hypostatic abstraction cannot reduce them. We will now exploit the relationship between projections and unions to show that this is not the case when 𝒟𝒟\mathcal{D}caligraphic_D is sufficiently large. But that requires somewhat more general projoins than Fagin-type decompositions (17)-(18), namely, unions of joins that are not Cartesian products.

Example 7.

Recall the join reduction of Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT from Example 3. Negating and applying de Morgan’s law, we get

¬In(x1,,xn)=¬I2(x1,xn)¬I2(xn1,xn)=j=1n1¬I2(xj,xn).subscript𝐼𝑛subscript𝑥1subscript𝑥𝑛subscript𝐼2subscript𝑥1subscript𝑥𝑛subscript𝐼2subscript𝑥𝑛1subscript𝑥𝑛superscriptsubscript𝑗1𝑛1subscript𝐼2subscript𝑥𝑗subscript𝑥𝑛\neg I_{n}(x_{1},\dots,x_{n})=\neg I_{2}(x_{1},x_{n})\lor\dots\lor\neg I_{2}(x% _{n-1},x_{n})=\bigvee_{j=1}^{n-1}\neg I_{2}(x_{j},x_{n}).¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ¬ italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∨ ⋯ ∨ ¬ italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ⋁ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ¬ italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) .

We cannot convert this disjunction into an existentially quantified formula because the attribute sets in each disjunct are different, but we can combine all of them into a single cover Λi:={i,n}assignsubscriptΛ𝑖𝑖𝑛\Lambda_{i}:=\{i,n\}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := { italic_i , italic_n } for i=1,,n1𝑖1𝑛1i=1,\dots,n-1italic_i = 1 , … , italic_n - 1. Then we define

Rij(x,y):={¬I2(x,y),i=j1,ij,assignsuperscriptsubscript𝑅𝑖𝑗𝑥𝑦casessubscript𝐼2𝑥𝑦𝑖𝑗otherwise1𝑖𝑗otherwiseR_{i}^{j}(x,y):=\begin{cases}\neg I_{2}(x,y),\,i=j\\ 1,\,i\neq j,\end{cases}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x , italic_y ) := { start_ROW start_CELL ¬ italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x , italic_y ) , italic_i = italic_j end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL 1 , italic_i ≠ italic_j , end_CELL start_CELL end_CELL end_ROW

so that

i=1n1Rij(xi,xn)=Rjj(xj,xn)=¬I2(xj,xn),superscriptsubscript𝑖1𝑛1superscriptsubscript𝑅𝑖𝑗subscript𝑥𝑖subscript𝑥𝑛superscriptsubscript𝑅𝑗𝑗subscript𝑥𝑗subscript𝑥𝑛subscript𝐼2subscript𝑥𝑗subscript𝑥𝑛\bigwedge_{i=1}^{n-1}R_{i}^{j}(x_{i},x_{n})=R_{j}^{j}(x_{j},x_{n})=\neg I_{2}(% x_{j},x_{n}),⋀ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ¬ italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ,

and

¬In(x1,,xn)=j=1n1i=1n1Rij(xi,xn).subscript𝐼𝑛subscript𝑥1subscript𝑥𝑛superscriptsubscript𝑗1𝑛1superscriptsubscript𝑖1𝑛1superscriptsubscript𝑅𝑖𝑗subscript𝑥𝑖subscript𝑥𝑛\neg I_{n}(x_{1},\dots,x_{n})=\bigvee_{j=1}^{n-1}\bigwedge_{i=1}^{n-1}R_{i}^{j% }(x_{i},x_{n}).¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ⋁ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ⋀ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) .

This disjunction is already amenable to “existentialization”. Assign a distinct α(j)𝒟𝛼𝑗𝒟\alpha(j)\in\mathcal{D}italic_α ( italic_j ) ∈ caligraphic_D to each disjunct, which requires |𝒟|n1𝒟𝑛1|\mathcal{D}|\geq n-1| caligraphic_D | ≥ italic_n - 1, and set

Ri(t,x,y):={Rij(x,y),t=α(j)0,tα(j),={¬I2(x,y),t=α(j),i=j1,t=α(j),ij0,tα(j).assignsubscript𝑅𝑖𝑡𝑥𝑦casessuperscriptsubscript𝑅𝑖𝑗𝑥𝑦𝑡𝛼𝑗otherwise0𝑡𝛼𝑗otherwisecasesformulae-sequencesubscript𝐼2𝑥𝑦𝑡𝛼𝑗𝑖𝑗otherwiseformulae-sequence1𝑡𝛼𝑗𝑖𝑗otherwise0𝑡𝛼𝑗otherwiseR_{i}(t,x,y):=\begin{cases}R_{i}^{j}(x,y),\,t=\alpha(j)\\ 0,\,t\neq\alpha(j),\end{cases}=\begin{cases}\neg I_{2}(x,y),\,t=\alpha(j),\,i=% j\\ 1,\,t=\alpha(j),\,i\neq j\\ 0,\,t\neq\alpha(j).\end{cases}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t , italic_x , italic_y ) := { start_ROW start_CELL italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_x , italic_y ) , italic_t = italic_α ( italic_j ) end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL 0 , italic_t ≠ italic_α ( italic_j ) , end_CELL start_CELL end_CELL end_ROW = { start_ROW start_CELL ¬ italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x , italic_y ) , italic_t = italic_α ( italic_j ) , italic_i = italic_j end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL 1 , italic_t = italic_α ( italic_j ) , italic_i ≠ italic_j end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL 0 , italic_t ≠ italic_α ( italic_j ) . end_CELL start_CELL end_CELL end_ROW

Then the disjunction converts into a one parameter projoin of ternaries:

¬In(x1,,xn)=t[i=1n1Ri(t,xi,xn)].subscript𝐼𝑛subscript𝑥1subscript𝑥𝑛𝑡delimited-[]superscriptsubscript𝑖1𝑛1subscript𝑅𝑖𝑡subscript𝑥𝑖subscript𝑥𝑛\neg I_{n}(x_{1},\dots,x_{n})=\exists t\left[\bigwedge_{i=1}^{n-1}R_{i}(t,x_{i% },x_{n})\right].¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ∃ italic_t [ ⋀ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] . (19)

Thus, for n4𝑛4n\geq 4italic_n ≥ 4 and |𝒟|n1𝒟𝑛1|\mathcal{D}|\geq n-1| caligraphic_D | ≥ italic_n - 1 the non-n𝑛nitalic_n-identity is projoin reducible with a single parameter. A similar construction works for non-n𝑛nitalic_n-diversity relation based on the join reduction (10), but the condition is instead |𝒟|n2n2𝒟superscript𝑛2𝑛2|\mathcal{D}|\geq\frac{n^{2}-n}{2}| caligraphic_D | ≥ divide start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_n end_ARG start_ARG 2 end_ARG.

Going back to ¬Insubscript𝐼𝑛\neg I_{n}¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, the required size of the domain can be traded for arity. If we use pairs of elements to index the disjuncts then only |𝒟|2n1superscript𝒟2𝑛1|\mathcal{D}|^{2}\geq n-1| caligraphic_D | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ italic_n - 1 is required, and similarly |𝒟|kn1superscript𝒟𝑘𝑛1|\mathcal{D}|^{k}\geq n-1| caligraphic_D | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ≥ italic_n - 1 if we use k𝑘kitalic_k-tuples. But we must have k+2n1𝑘2𝑛1k+2\leq n-1italic_k + 2 ≤ italic_n - 1 so that it is still a reduction. Taking k=n3𝑘𝑛3k=n-3italic_k = italic_n - 3 and noticing that (n1)1n32superscript𝑛11𝑛32(n-1)^{\frac{1}{n-3}}\leq 2( italic_n - 1 ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n - 3 end_ARG end_POSTSUPERSCRIPT ≤ 2 for n5𝑛5n\geq 5italic_n ≥ 5 we conclude that ¬Insubscript𝐼𝑛\neg I_{n}¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is reducible for n5𝑛5n\geq 5italic_n ≥ 5 on any 𝒟𝒟\mathcal{D}caligraphic_D with |𝒟|2𝒟2|\mathcal{D}|\geq 2| caligraphic_D | ≥ 2 (albeit not necessarily to ternaries).

The construction in the above example can be generalized to prove the following theorem.

Theorem 12.

Suppose an n𝑛nitalic_n-ary R𝑅Ritalic_R is join reducible to N𝑁Nitalic_N factors of arity at most l𝑙litalic_l, and N|𝒟|nl1𝑁superscript𝒟𝑛𝑙1N\leq|\mathcal{D}|^{n-l-1}italic_N ≤ | caligraphic_D | start_POSTSUPERSCRIPT italic_n - italic_l - 1 end_POSTSUPERSCRIPT. Then ¬R𝑅\neg R¬ italic_R is projoin reducible. In particular, if R𝑅Ritalic_R has a k𝑘kitalic_k-key, with kn4𝑘𝑛4k\leq n-4italic_k ≤ italic_n - 4 for |𝒟|=2𝒟2|\mathcal{D}|=2| caligraphic_D | = 2 and kn3𝑘𝑛3k\leq n-3italic_k ≤ italic_n - 3 for |𝒟|3𝒟3|\mathcal{D}|\geq 3| caligraphic_D | ≥ 3, then ¬R𝑅\neg R¬ italic_R is projoin reducible.

Proof.

As in Example (7), we present ¬R𝑅\neg R¬ italic_R as a disjunction and then convert it into a projoin. To get enough tuples for N𝑁Nitalic_N disjuncts we need N|𝒟|k𝑁superscript𝒟𝑘N\leq|\mathcal{D}|^{k}italic_N ≤ | caligraphic_D | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, where k𝑘kitalic_k is the number of projoin parameters, and k+ln1𝑘𝑙𝑛1k+l\leq n-1italic_k + italic_l ≤ italic_n - 1 so that the converted factors still have lower arity than R𝑅Ritalic_R.

For the second claim, apply Theorem 5 to obtain a join reduction of R𝑅Ritalic_R with N=nk𝑁𝑛𝑘N=n-kitalic_N = italic_n - italic_k and l=k+1𝑙𝑘1l=k+1italic_l = italic_k + 1, and note that N|𝒟|nl1𝑁superscript𝒟𝑛𝑙1N\leq|\mathcal{D}|^{n-l-1}italic_N ≤ | caligraphic_D | start_POSTSUPERSCRIPT italic_n - italic_l - 1 end_POSTSUPERSCRIPT becomes nk|𝒟|nk2𝑛𝑘superscript𝒟𝑛𝑘2n-k\leq|\mathcal{D}|^{n-k-2}italic_n - italic_k ≤ | caligraphic_D | start_POSTSUPERSCRIPT italic_n - italic_k - 2 end_POSTSUPERSCRIPT. By calculus, x2x2𝑥superscript2𝑥2x\leq 2^{x-2}italic_x ≤ 2 start_POSTSUPERSCRIPT italic_x - 2 end_POSTSUPERSCRIPT for x4𝑥4x\geq 4italic_x ≥ 4 and xdx2𝑥superscript𝑑𝑥2x\leq d^{x-2}italic_x ≤ italic_d start_POSTSUPERSCRIPT italic_x - 2 end_POSTSUPERSCRIPT for x3𝑥3x\geq 3italic_x ≥ 3 when d3𝑑3d\geq 3italic_d ≥ 3. ∎

The next example gives a taste of intricacies involved in ruling out general projoin reductions to prove unconditional projoin irreducibility.

Example 8.

Consider reducing ¬I3subscript𝐼3\neg I_{3}¬ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT on a domain of size |𝒟|=d𝒟𝑑|\mathcal{D}|=d| caligraphic_D | = italic_d with two parameters. As in Example 6, the ansatz reduces to

¬I3(x1,x2,x3)=t1t2[A(t1,t2)i=13(Pi(t1,x1)Qi(t2,x2))].subscript𝐼3subscript𝑥1subscript𝑥2subscript𝑥3subscript𝑡1subscript𝑡2delimited-[]𝐴subscript𝑡1subscript𝑡2superscriptsubscript𝑖13superscript𝑃𝑖subscript𝑡1subscript𝑥1superscript𝑄𝑖subscript𝑡2subscript𝑥2\neg I_{3}(x_{1},x_{2},x_{3})=\exists t_{1}\exists t_{2}\left[A(t_{1},t_{2})% \land\bigwedge_{i=1}^{3}\Big{(}P^{i}(t_{1},x_{1})\land Q^{i}(t_{2},x_{2})\Big{% )}\right].¬ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = ∃ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∃ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_A ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ ⋀ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ( italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∧ italic_Q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) ] .

Replacing t1,t2𝒟subscript𝑡1subscript𝑡2𝒟t_{1},t_{2}\in\mathcal{D}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ caligraphic_D by indices j,kd𝑗𝑘subscript𝑑j,k\in{\mathbb{N}}_{d}italic_j , italic_k ∈ blackboard_N start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT we transform it into a disjunction

¬I3(x1,x2,x3)=j,k=1dajk[i=13(Pji(xi)Qki(xi))].subscript𝐼3subscript𝑥1subscript𝑥2subscript𝑥3superscriptsubscript𝑗𝑘1𝑑subscript𝑎𝑗𝑘delimited-[]superscriptsubscript𝑖13subscriptsuperscript𝑃𝑖𝑗subscript𝑥𝑖subscriptsuperscript𝑄𝑖𝑘subscript𝑥𝑖\neg I_{3}(x_{1},x_{2},x_{3})=\bigvee_{j,k=1}^{d}a_{jk}\land\left[\bigwedge_{i% =1}^{3}\Big{(}P^{i}_{j}(x_{i})\land Q^{i}_{k}(x_{i})\Big{)}\right].¬ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = ⋁ start_POSTSUBSCRIPT italic_j , italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT ∧ [ ⋀ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ( italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∧ italic_Q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ] .

Since unaries represent subsets of 𝒟𝒟\mathcal{D}caligraphic_D, conjunctions with different xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT their Cartesian products, and with the same xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT their intersections we obtain a union of unary Cartesian products

¬I3=j,k=1dajk[×i=13(PjiQki)],subscript𝐼3superscriptsubscript𝑗𝑘1𝑑subscript𝑎𝑗𝑘delimited-[]superscriptsubscript𝑖13subscriptsuperscript𝑃𝑖𝑗subscriptsuperscript𝑄𝑖𝑘\neg I_{3}=\bigcup_{j,k=1}^{d}a_{jk}\left[\operatorname*{\scalebox{1.5}{$% \times$}}_{i=1}^{3}\Big{(}P^{i}_{j}\cap Q^{i}_{k}\Big{)}\right],¬ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = ⋃ start_POSTSUBSCRIPT italic_j , italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT [ × start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ( italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∩ italic_Q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] ,

where ajk=0,1subscript𝑎𝑗𝑘01a_{jk}=0,1italic_a start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT = 0 , 1 determines whether the term is included into the union. In contrast to Example 6, we can use up to d2superscript𝑑2d^{2}italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT terms in the union, but, as a tradeoff, the unary factors are not independent but must form a subset-valued rank 1111 Boolean matrix Rjki=PjiQkisubscriptsuperscript𝑅𝑖𝑗𝑘subscriptsuperscript𝑃𝑖𝑗subscriptsuperscript𝑄𝑖𝑘R^{i}_{jk}=P^{i}_{j}\cap Q^{i}_{k}italic_R start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT = italic_P start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∩ italic_Q start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for each i𝑖iitalic_i. Not only do we have to check whether ¬I3subscript𝐼3\neg I_{3}¬ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT splits into a union of up to d2superscript𝑑2d^{2}italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT unary Cartesian products, but also whether Boolean matrices Rjkisubscriptsuperscript𝑅𝑖𝑗𝑘R^{i}_{jk}italic_R start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT of their factors simultaneously factorize into outer products of Boolean vectors.

The Boolean matrix factorization problem is quite involved even for 00-1111 matrices [32], and with more parameters one would have to deal with simultaneous factorization of even more interdependent Boolean tensors. We can now observe the progression from a simple cardinality condition for key reductions (Theorem 9), to finding union decompositions with independent Cartesian factors for multikey reductions (Theorem 11), and to finding such decompositions with intricate tensor factorizations for general projoin reductions with many parameters. This should dispel the initial impression that with ‘enough’ parameters every relation ‘clearly should be’ projoin reducible. And it suggests a fruitful connection between reduction of relations and factorization of Boolean tensors, an active area of research in modern data science [31].

6 Bonds and teridentity

In this section we will connect the theory of projoin reductions motivated by the database theory to the older theory of C.S. Peirce that supported his once controversial reduction thesis. While the main ideas are scattered in Peirce’s writings, they did not gain currency until the formalizations by Herzberger [17] and Burch [8]. For a modern mathematical approach see [13, 14].

Unlike Cartesian products and joins, projoin is not a single operation on attributed relations. Its definition additionally depends on the set of projected attributes. One may feel that this is too permissive and/or clumsy. A natural way to specify projected attributes intrinsically is to choose all and only those that are not shared by the factors, i.e. to project out the shared attributes. If we think of joining as selecting tuples that match on shared attributes and splicing them together then leaving out the matched parts can give a meaningful response to a query.

Let us call such special projoins pure. Purity is a significant restriction on the operation: while the Fagin-type projoins (17)-(18) are pure, the reduction (19) we constructed for ¬Insubscript𝐼𝑛\neg I_{n}¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is not. In the case of two factors, the pure projoin is what Peirce called relative product (“relative” was his term for relation) [7]. For binary relations, functions in particular, it is simply their composition: y[P(x,y)Q(y,z)]𝑦delimited-[]𝑃𝑥𝑦𝑄𝑦𝑧\exists y\left[P(x,y)\land Q(y,z)\right]∃ italic_y [ italic_P ( italic_x , italic_y ) ∧ italic_Q ( italic_y , italic_z ) ]. Peirce considered the relative product more basic than joins and projections through which we defined it, and preferred to reverse the order of definitions.

Like Cartesian and Boolean products, the relative product is a commutative binary operation on attributed relations. This is because the positions with the quantified variable are determined by the shared attribute, not by the order of factors, so commutativity reflects the commutativity of conjunction. However, unlike the other two, the relative product is not associative. Consider binary relations P(u,x),Q(u,y),R(u,z)𝑃𝑢𝑥𝑄𝑢𝑦𝑅𝑢𝑧P(u,x),Q(u,y),R(u,z)italic_P ( italic_u , italic_x ) , italic_Q ( italic_u , italic_y ) , italic_R ( italic_u , italic_z ). Associating the first two first gives t[P(t,x)Q(t,y)R(u,z)]𝑡delimited-[]𝑃𝑡𝑥𝑄𝑡𝑦𝑅𝑢𝑧\exists t\left[P(t,x)\land Q(t,y)\land R(u,z)\right]∃ italic_t [ italic_P ( italic_t , italic_x ) ∧ italic_Q ( italic_t , italic_y ) ∧ italic_R ( italic_u , italic_z ) ], but the last two first gives t[P(u,x)Q(t,y)R(t,z)]𝑡delimited-[]𝑃𝑢𝑥𝑄𝑡𝑦𝑅𝑡𝑧\exists t\left[P(u,x)\land Q(t,y)\land R(t,z)\right]∃ italic_t [ italic_P ( italic_u , italic_x ) ∧ italic_Q ( italic_t , italic_y ) ∧ italic_R ( italic_t , italic_z ) ]. And t[P(t,x)Q(t,y)R(t,z)]𝑡delimited-[]𝑃𝑡𝑥𝑄𝑡𝑦𝑅𝑡𝑧\exists t\left[P(t,x)\land Q(t,y)\land R(t,z)\right]∃ italic_t [ italic_P ( italic_t , italic_x ) ∧ italic_Q ( italic_t , italic_y ) ∧ italic_R ( italic_t , italic_z ) ] cannot be generated by relative products at all, so not even all pure projoins are generated. This is because identified variables (shared attributes) in joins remain free, and it does not matter whether we identify two of them at a time or more, we can repeat the exercise when iterating. But in the relative product identified variables are quantified over (projected out), and no new variable can be identified with them afterwards.

In other words, in iterated relative products no attribute can be shared by more than two factors, and relative product is associative when restricted to triples of relations satisfying this condition. This is a further restriction on admissible projoins that restricts even relation schemes of factors that can appear in them. We will adopt Herzberger’s term bond for this restricted class of projoins, although our bond is slightly more permissive than his along the lines adopted in [14].

Definition 7.

A collection of attributed relations Ri𝒟Λisubscript𝑅𝑖superscript𝒟subscriptΛ𝑖R_{i}\subseteq\mathcal{D}^{\Lambda_{i}}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is called bondable when no three of ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT share an attribute. Their bond is then the projective join with all the shared attributes projected out, i.e. the projected set is Γ:=iΛi\(ijΛiΛj)assignΓsubscript𝑖\subscriptΛ𝑖subscript𝑖𝑗subscriptΛ𝑖subscriptΛ𝑗\Gamma:=\cup_{i}\Lambda_{i}\backslash\left(\cup_{i\neq j}\Lambda_{i}\cap% \Lambda_{j}\right)roman_Γ := ∪ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT \ ( ∪ start_POSTSUBSCRIPT italic_i ≠ italic_j end_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∩ roman_Λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). A relation R𝒟n𝑅superscript𝒟𝑛R\in\mathcal{D}^{n}italic_R ∈ caligraphic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is a bond over a cover TΣ=Λ1ΛmTΣsubscriptΛ1subscriptΛ𝑚\textup{T}\cup\Sigma=\Lambda_{1}\cup\dots\cup\Lambda_{m}T ∪ roman_Σ = roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ ⋯ ∪ roman_Λ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT with TΣ=TΣ\textup{T}\cap\Sigma=\emptysetT ∩ roman_Σ = ∅ when it is a bond of some RΛi𝒟Λisuperscript𝑅subscriptΛ𝑖superscript𝒟subscriptΛ𝑖R^{\Lambda_{i}}\subseteq\mathcal{D}^{\Lambda_{i}}italic_R start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⊆ caligraphic_D start_POSTSUPERSCRIPT roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, called its bond factors, with elements of T called its (bond) parameters. R𝑅Ritalic_R is called bond reducible when it is a bond with 0<|Λi|<|Σ|0subscriptΛ𝑖Σ0<|\Lambda_{i}|<|\Sigma|0 < | roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | < | roman_Σ |.

Note that we allow T=T\textup{T}=\emptysetT = ∅, and so, in contrast to Peirce and Herzberger, Cartesian products are bonds, with 00 parameters. This makes bonds a generalization of Cartesian products alternative to joins, and simplifies some formulations. In terms of predicates, our definition means that the sets of free variables in different bond factors are disjoint, and every bound variable is present in exactly two factors. So the projoins t[P(x,t)Q(x,t)]𝑡delimited-[]𝑃𝑥𝑡𝑄𝑥𝑡\exists t\left[P(x,t)\land Q(x,t)\right]∃ italic_t [ italic_P ( italic_x , italic_t ) ∧ italic_Q ( italic_x , italic_t ) ], t[P(x,t)Q(y,z)]𝑡delimited-[]𝑃𝑥𝑡𝑄𝑦𝑧\exists t\left[P(x,t)\land Q(y,z)\right]∃ italic_t [ italic_P ( italic_x , italic_t ) ∧ italic_Q ( italic_y , italic_z ) ] are not bonds, and neither are hypostatic abstractions like (12).

We will now show, following Peirce and Burch [8, 9], that, somewhat surprisingly, this severely restricted operation leads to essentially the same notion of reducibility as general projoins. Peirce’s first observation was that multiple variable identifications bond reduce to pairwise ones by using identity predicates, e.g.

R1(t,xΛ1)Rn(t,xΛn)=t1tn[In+1(t,t1,,tn)R1(t1,xΛ1)Rn(tn,xΛn)].subscript𝑅1𝑡subscript𝑥subscriptΛ1subscript𝑅𝑛𝑡subscript𝑥subscriptΛ𝑛subscript𝑡1subscript𝑡𝑛delimited-[]subscript𝐼𝑛1𝑡subscript𝑡1subscript𝑡𝑛subscript𝑅1subscript𝑡1subscript𝑥subscriptΛ1subscript𝑅𝑛subscript𝑡𝑛subscript𝑥subscriptΛ𝑛R_{1}(t,x_{\Lambda_{1}})\land\dots\land R_{n}(t,x_{\Lambda_{n}})=\\ \exists t_{1}\dots\exists t_{n}\left[I_{n+1}(t,t_{1},\dots,t_{n})\land R_{1}(t% _{1},x_{\Lambda_{1}})\land\dots\land R_{n}(t_{n},x_{\Lambda_{n}})\right].start_ROW start_CELL italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t , italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ∧ ⋯ ∧ italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t , italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) = end_CELL end_ROW start_ROW start_CELL ∃ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … ∃ italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT [ italic_I start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ( italic_t , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∧ italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ∧ ⋯ ∧ italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] . end_CELL end_ROW (20)

And his second observation was that n𝑛nitalic_n-identities Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for n4𝑛4n\geq 4italic_n ≥ 4 bond reduce to teridentities I3subscript𝐼3I_{3}italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT:

In(x1,,xn)=t1tn3[I3(x1,x2,t1)I3(t1,x3,t2)I3(tn3,xn1,xn)].subscript𝐼𝑛subscript𝑥1subscript𝑥𝑛subscript𝑡1subscript𝑡𝑛3delimited-[]subscript𝐼3subscript𝑥1subscript𝑥2subscript𝑡1subscript𝐼3subscript𝑡1subscript𝑥3subscript𝑡2subscript𝐼3subscript𝑡𝑛3subscript𝑥𝑛1subscript𝑥𝑛I_{n}(x_{1},\dots,x_{n})=\\ \exists t_{1}\dots\exists t_{n-3}\left[I_{3}(x_{1},x_{2},t_{1})\land I_{3}(t_{% 1},x_{3},t_{2})\land\dots\land I_{3}(t_{n-3},x_{n-1},x_{n})\right].start_ROW start_CELL italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = end_CELL end_ROW start_ROW start_CELL ∃ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … ∃ italic_t start_POSTSUBSCRIPT italic_n - 3 end_POSTSUBSCRIPT [ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∧ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ ⋯ ∧ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_n - 3 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] . end_CELL end_ROW (21)

Projoins with quantification into a single position can also be reduced to bonds by replacing the factors with the quantified variables by factors of lower arity, as in P~(xΛ):=tP(xΛ,t)assign~𝑃subscript𝑥Λ𝑡𝑃subscript𝑥Λ𝑡\widetilde{P}(x_{\Lambda}):=\exists tP(x_{\Lambda},t)over~ start_ARG italic_P end_ARG ( italic_x start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT ) := ∃ italic_t italic_P ( italic_x start_POSTSUBSCRIPT roman_Λ end_POSTSUBSCRIPT , italic_t ). Applying the above identities converts any projoin into a bond of the original factors (up to renaming of attributes), their projections, and teridentities. Let us call this conversion bond explication. For example, the bond explication of t[P(t,x1,x2,t)sQ(x2,x3,s,t)]𝑡delimited-[]𝑃𝑡subscript𝑥1subscript𝑥2𝑡𝑠𝑄subscript𝑥2subscript𝑥3𝑠𝑡\exists\,t\left[P(t,x_{1},x_{2},t)\land\exists\,s\,Q(x_{2},x_{3},s,t)\right]∃ italic_t [ italic_P ( italic_t , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t ) ∧ ∃ italic_s italic_Q ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_s , italic_t ) ] is

t1t2t3y1y2[I3(t1,t2,t3)I3(y1,x2,y2)P(t1,x1,y1,t2)..s[I1(s)Q(y2,x3,s,t3)]].\exists\,t_{1}\exists\,t_{2}\exists\,t_{3}\,\exists\,y_{1}\exists\,y_{2}\big{[% }I_{3}(t_{1},t_{2},t_{3})\land I_{3}(y_{1},x_{2},y_{2})\land P(t_{1},x_{1},y_{% 1},t_{2})\big{.}\\ \big{.}\land\exists\,s\left[I_{1}(s)\land Q(y_{2},x_{3},s,t_{3})\right]\!\big{% ]}.start_ROW start_CELL ∃ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∃ italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∃ italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∃ italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∃ italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∧ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ italic_P ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) . end_CELL end_ROW start_ROW start_CELL . ∧ ∃ italic_s [ italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_s ) ∧ italic_Q ( italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_s , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ] ] . end_CELL end_ROW (22)

Since projection does not increase arity, and explication can add only ternaries, we have the following theorem.

Theorem 13.

An n𝑛nitalic_n-ary with n4𝑛4n\geq 4italic_n ≥ 4 is projoin reducible if and only if it is bond reducible. A ternary is projoin reducible if and only if it decomposes into a bond of unaries, binaries and teridentities.

Proof.

The only claim not covered by bond explication is that ternaries that are bonds of unaries, binaries and teridentities are projoin reducible. Given such a bond, assign a new variable to each teridentity and replace by it all occurrences of the original variables from the teridentity. Then remove the teridentity and the quantifiers over its variables. By (20), this produces an equivalent expression, and, since all teridentities are removed, it is a projoin of unaries and binaries only. ∎

In particular, the projoin reduction (19) of ¬Insubscript𝐼𝑛\neg I_{n}¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT can be transformed not just into a pure one, but even into a bond reduction. Note that hypostatic abstraction (15) with k=1𝑘1k=1italic_k = 1, when it applies, decomposes any ternary into a bond of binaries and teridentities. This gives us a bond analog of Corollary 1.

Theorem 14 (Peirce’s reduction thesis).

Any n𝑛nitalic_n-ary with n3𝑛3n\geq 3italic_n ≥ 3 on an infinite domain reduces to a bond of unaries, binaries and teridentities. The only bond irreducible relations on such domains are all unaries, non-degenerate binaries, and non-degenerate ternaries.

The first, reducibility, clause of the thesis is a direct consequence of Corollary 1 and Theorem 13. Indeed, it is essentially equivalent to projoin reducibility to binaries alone. Ironically, Löwenheim’s result [27] to that effect once made it controversial due to the confusion between these closely related notions of reducibility [9, 21].

The second, irreducibility, clause, as applied to unaries and binaries, also follows trivially. Since bond is a special case of projoin, and they are projoin irreducible, they are all the more bond irreducible. However, the part concerning ternaries is non-trivial. It will follow from a graph-theoretic argument in Section 9 (Theorem 16).

While bond and projoin reducibilities are (almost) the same, bond reductions are much more special than general projoin reductions. Peirce felt that general projoin reductions conceal the complexity involved in attribute matching (variable identifications), and, as a result, do not provide “true” or “complete” analysis of a relation delivered by bonds [7, 21].

7 Bonding diagrams

In this section we introduce graphical representation of joins, projoins and bonds which pictures the structure of relations by analogy to diagrams of chemical decompositions of compounds into elements. It also allows to bring in graph-theoretic methods into analysis of reductions. The diagrams we describe are simplified versions of Peirce’s existential graphs [14] covering only a fragment of predicate logic (conjunction and existential quantifier) without the associated graphical calculus. We use some standard notation and terminology from graph theory [4] throughout this and the following sections.

Definition 8.

The projoin graph is a labeled bipartite graph with vertices for each factor and each attribute of the projoin. An edge joins them when the attribute is in the relation scheme of the factor. The vertices are labeled by relation and attribute names, and sorted into predicate vertices (for factors), free attribute vertices (for projected attributes) and bound attribute vertices (for projected out attributes). The valency of a vertex is its graph-theoretic degree (the number of incident edges) for predicate and bound attribute vertices, and the graph-theoretic degree increased by 1111 for free attribute vertices.

a)   Refer to caption           b)   Refer to caption

Figure 1: Projoin graphs for a) t[P(x1)Q(x1,t)R(t,x2)]𝑡delimited-[]𝑃subscript𝑥1𝑄subscript𝑥1𝑡𝑅𝑡subscript𝑥2\exists\,t[P(x_{1})\land Q(x_{1},t)\land R(t,x_{2})]∃ italic_t [ italic_P ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∧ italic_Q ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t ) ∧ italic_R ( italic_t , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ]; b) st[P(x1,s,t)Q(s,t,x2)]𝑠𝑡delimited-[]𝑃subscript𝑥1𝑠𝑡𝑄𝑠𝑡subscript𝑥2\exists\,s\exists\,t[P(x_{1},s,t)\land Q(s,t,x_{2})]∃ italic_s ∃ italic_t [ italic_P ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s , italic_t ) ∧ italic_Q ( italic_s , italic_t , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ].

In practice, we depict predicate vertices as predicate letters and attribute vertices as dots labeled by variables. This makes it easier to associate graphs to predicate formulas, and, since only matching of attributes matters in decompositions rather than their proper names, variable labels work well enough for our purposes. Free attribute vertices are additionally labeled by an extra stem (hanging edge) coming out of them, which explains the valency convention. Examples of projoin graphs and the corresponding predicate formulas are shown on Figure 1.

Projoin graphs can get quite cluttered and are made somewhat more readable by converting them into bonding diagrams defined below. These diagrams are also better equipped to depict bonds.

Definition 9.

The bonding diagram is obtained from the projoin graph by replacing each bivalent bound vertex by an edge connecting the corresponding predicate vertices, and replacing each bivalent free vertex by a hanging edge from the corresponding predicate vertex. The new edges carry the attribute labels of the removed vertices, and the labels of free attribute vertices are moved to their stems. We call the remaining attribute vertices of valency greater than 1111 branch points, and of valency 1111 dead ends. Hanging edges, incident to predicate vertices and branch points, are called loose ends. Bonding diagrams of bonds are called bond diagrams.

a)   Refer to caption     b)   Refer to caption

Figure 2: Bonding diagrams of projoins from Figure 1.

Loose ends correspond to free variables, and dead ends to bound variables occurring in a single position. Branch points are the device for identifying variables in different predicates – all edges attached to a branch point carry the same variable. The variable is free when one of the attached edges is a loose end, as in the T-shaped link on Figure 2 a), otherwise it is existentially quantified, as t𝑡titalic_t on Figure 3 a). When a variable appears in only two predicates, no branch point is necessary, a simple edge connecting them suffices. Thus, bonding diagrams of bonds are graphically distinguished by having no branch points. When two or more variables appear in the same two predicates, the diagram displays a multiedge connecting their vertices, as in Figure 2 b).

Note that if a bonding diagram has disconnected subdiagrams then the relations they represent are Cartesian factors of the original, except for the case when they have no free attributes, i.e. are closed formulas. If they are true all predicates in them can be dropped without any loss, and if false the factored relation is itself empty. From now on we will only consider projoins without such redundant predicates and call them non-redundant.

Corollary 2.

If a non-redundant bonding diagram of a relation is disconnected then the relation is degenerate. The connected components are bonding diagrams of its Cartesian factors.

The converse is false for the trivial reason that one can use degenerate predicates in a reduction. But if a relation is degenerate it admits reductions with disconnected and non-redundant bonding diagrams.

We intentionally used a two-step definition instead of defining bonding diagrams directly to emphasize the singling out of bivalent vertices (pairwise attribute identifications), which highlights binary bonding, i.e. relative products. A bond diagram will have no attribute vertices, and all its attribute labels will attach to edges, including hanging edges.

Our next observation is that bond explication also has a simple graphical interpretation in terms of bonding diagrams.

a) Refer to caption   b) Refer to caption

Figure 3: a) Bond explication of t[P(t,x1,x2,t)sQ(x2,x3,s,t)]𝑡delimited-[]𝑃𝑡subscript𝑥1subscript𝑥2𝑡𝑠𝑄subscript𝑥2subscript𝑥3𝑠𝑡\exists\,t\left[P(t,x_{1},x_{2},t)\land\exists\,s\,Q(x_{2},x_{3},s,t)\right]∃ italic_t [ italic_P ( italic_t , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t ) ∧ ∃ italic_s italic_Q ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_s , italic_t ) ] as (22), here Q~(t3,y2,x3):=sQ(t3,y2,x3,s)assign~𝑄subscript𝑡3subscript𝑦2subscript𝑥3𝑠𝑄subscript𝑡3subscript𝑦2subscript𝑥3𝑠\widetilde{Q}(t_{3},y_{2},x_{3}):=\exists s\,Q(t_{3},y_{2},x_{3},s)over~ start_ARG italic_Q end_ARG ( italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) := ∃ italic_s italic_Q ( italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_s ); b) bonding diagram of the reduction (21) of n𝑛nitalic_n-identity to teridentities.
Corollary 3.

The bonding diagram of a bond explicated projoin is obtained from its original bonding diagram by absorbing dead ends into the adjacent predicate vertices, and replacing n𝑛nitalic_n-valent branch points by the diagrams of bond reductions of n𝑛nitalic_n-identities to teridentities (Figure 3).

It is particularly pronounced in the diagrams that attribute identifications (branch points) function like hidden factors in projoin reductions. Indeed, nothing substantive distinguishes them from predicate vertices in assembling the relation. Thus, one can see bond explication as analogous to adjoining “ideal elements” (Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT) to uniformize factorizations in ring algebra.

8 Complete reductions and ternarity

So far we considered only general reductions, not complete reductions down to irreducibles. In this section we will start looking at their structure of their complete reductions, but, in the light of Peirce’s reduction thesis, we consider only relations reducible to unaries, binaries and ternaries. They form a bond subalgebra of all relations, which is of interest even if irreducible higher arity relations exist on finite domains.

One can see from bonding diagrams that much work at putting a relation together is done by branch points. Hypostatic abstraction, for example, has a single branch point that holds together an otherwise loose collection of binaries. Bond explication removes branch points, but at a price of adding ternaries to the reduction. This suggests that ternaries play the role of relays in information exchange among the attributes, and their number quantifies the ‘complexity of relating attributes’. Mutual information between attributes has been studied in the context of database theory [29], and, more recently, as a measure of information integration in biological systems [36]. Another potential application is to designing conceptual schemas of databases friendly to natural language and human representation of knowledge and reasoning, as in Sowa’s conceptual graphs that are based on bonding diagrams [11]. For more motivation and further discussion we refer to [21].

Thus, we will be interested in counting the number of ternaries in complete reductions of a relation, assuming that it is reducible to unaries, binaries and ternaries. Of course, this number may vary from one reduction to another, and what we really want is the minimal number over all possible reductions.

Definition 10.

A bond is called subternaric when all of its factors have arity at most 3333. Ternarity of a relation, denoted 𝐭𝐞𝐫𝐭𝐞𝐫\mathrm{\mathbf{ter\,}}bold_ter, is the minimal number of ternaries in its subternaric bond reductions, and \infty if no such reductions exist. Non-redundant bond reductions with 𝐭𝐞𝐫𝐭𝐞𝐫\mathrm{\mathbf{ter\,}}bold_ter ternaries will be called minimal bond reductions.

Ternarity of unaries and binaries is obviously 00, and of ternaries is at most 1111. By the reducibility clause of Peirce’s reduction thesis (Theorem 14), all relations on infinite domains have subternaric bond reductions. Whether this holds on finite domains, i.e. whether 𝐭𝐞𝐫(R)<𝐭𝐞𝐫𝑅\mathrm{\mathbf{ter\,}}(R)<\inftybold_ter ( italic_R ) < ∞ for all R𝑅Ritalic_R, is an open problem, equivalent to projoin reducibility of all relations to binaries by Theorem 13.

The following lemma is a direct consequence of the definitions.

Lemma 2.

Ternarity is subadditive on relative products. For a projoin R𝑅Ritalic_R with factors Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, free attributes indexed by j𝑗jitalic_j with the j𝑗jitalic_j-th shared by mjsubscript𝑚𝑗m_{j}italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT factors, and bound attributes indexed by k𝑘kitalic_k with the k𝑘kitalic_k-th shared by nksubscript𝑛𝑘n_{k}italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT factors,

𝐭𝐞𝐫(R)i𝐭𝐞𝐫(Ri)+j(mj1)+k(nk2).𝐭𝐞𝐫𝑅subscript𝑖𝐭𝐞𝐫subscript𝑅𝑖subscript𝑗subscript𝑚𝑗1subscript𝑘subscript𝑛𝑘2\mathrm{\mathbf{ter\,}}(R)\leq\sum_{i}\mathrm{\mathbf{ter\,}}(R_{i})+\sum_{j}(% m_{j}-1)+\sum_{k}(n_{k}-2).bold_ter ( italic_R ) ≤ ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_ter ( italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - 1 ) + ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 2 ) . (23)
Proof.

Subadditivity is obvious because bonding two bond diagrams does not add predicate vertices or branch points. In the projoin graph of R𝑅Ritalic_R, aside from Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT vertices, we have free branch points of valencies mj+1subscript𝑚𝑗1m_{j}+1italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + 1, and bound branch points of valencies nksubscript𝑛𝑘n_{k}italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. According to (20)-(21), bond explication replaces the former with mj+12subscript𝑚𝑗12m_{j}+1-2italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + 1 - 2 teridentities and the latter with nk2subscript𝑛𝑘2n_{k}-2italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 2 teridentities, hence the sum in (23). ∎

Bounds on ternarity from above can be obtained from constructions of bond reductions. Recall that any projoin reduction can be explicated into bond reduction, and one can obtain projoin reductions by using keys (Theorem 9). However, only 1111- or 2222-keys produce subternaric reductions because for a k𝑘kitalic_k-key the factors have arity k+1𝑘1k+1italic_k + 1.

Theorem 15.

If an n𝑛nitalic_n-ary R𝑅Ritalic_R admits a 1111-key then 𝐭𝐞𝐫(R)n2𝐭𝐞𝐫𝑅𝑛2\mathrm{\mathbf{ter\,}}(R)\leq n-2bold_ter ( italic_R ) ≤ italic_n - 2, if it admits a 2222-key then 𝐭𝐞𝐫(R)3n4𝐭𝐞𝐫𝑅3𝑛4\mathrm{\mathbf{ter\,}}(R)\leq 3n-4bold_ter ( italic_R ) ≤ 3 italic_n - 4, and if it already has a 2222-key then 𝐭𝐞𝐫(R)3n8𝐭𝐞𝐫𝑅3𝑛8\mathrm{\mathbf{ter\,}}(R)\leq 3n-8bold_ter ( italic_R ) ≤ 3 italic_n - 8.

Proof.

Hypostatic abstraction (15) with k=1𝑘1k=1italic_k = 1 reduces R𝑅Ritalic_R to a projoin of n𝑛nitalic_n binaries with a single shared attribute, which is bound and shared by all n𝑛nitalic_n factors. Therefore, the first two terms in (23) vanish and the last one produces n2𝑛2n-2italic_n - 2. In the case of k=2𝑘2k=2italic_k = 2 we obtain a projoin of n𝑛nitalic_n ternaries with two bound shared attributes, each by all n𝑛nitalic_n factors. Therefore, the right hand side of (23) reduces to n+2(n2)=3n4𝑛2𝑛23𝑛4n+2(n-2)=3n-4italic_n + 2 ( italic_n - 2 ) = 3 italic_n - 4. When R𝑅Ritalic_R has a 2222-key, there are only n2𝑛2n-2italic_n - 2 factors, and the shared attributes are now free, so the count changes to n2+2(n21)=3n8𝑛22𝑛213𝑛8n-2+2(n-2-1)=3n-8italic_n - 2 + 2 ( italic_n - 2 - 1 ) = 3 italic_n - 8. ∎

It is interesting that there is no drop in ternarity bound when the relation already has a 1111-key as opposed to just admitting one. There is a change from projoin to join, but all the ternaries come from the branch point that has valency n𝑛nitalic_n in both cases. Other special constructions also provide upper bounds. For example, it follows from (19) that 𝐭𝐞𝐫(¬In)3n6𝐭𝐞𝐫subscript𝐼𝑛3𝑛6\mathrm{\mathbf{ter\,}}(\neg I_{n})\leq 3n-6bold_ter ( ¬ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≤ 3 italic_n - 6 for n4𝑛4n\geq 4italic_n ≥ 4 and |𝒟|n1𝒟𝑛1|\mathcal{D}|\geq n-1| caligraphic_D | ≥ italic_n - 1. Bounds from below are conceptually harder because we have to rule out all bonds with fewer ternaries as reductions. We will obtain such a bound for non-degenerate relations in the next section by exploiting graph-theoretic properties of bond diagrams.

Upon reflection, absence of reducible factors in a reduction is too weak a property. Not only can complete reductions have redundant predicates, as long as those are irreducible, but they may not be minimal. For example, four teridentities bonded in a square reduce I4subscript𝐼4I_{4}italic_I start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT to irreducibles, but hypostatic abstraction gives a minimal reduction with only two teridentities.

On the other hand, minimal reductions can be incomplete for only trivial reasons. And they can be converted into complete reductions by a trimming procedure that is reflected in diagrams by merging of predicate vertices. In algebraic terms, when two factors share bound attribute(s) we replace them in the bond by their relative product. Merging cannot be used on a pair of ternaries with a single shared attribute, because it creates a quaternary, but in all other cases the bond remains subternaric. For example, the bond on Figure 2 b) can be merged into a single binary. The next lemma uses merging to produce complete minimal reductions, and gives additional support to discounting unaries and binaries in ternarity counts.

Lemma 3.

Let R𝑅Ritalic_R be a subternarily reducible relation.

(i) In any minimal reduction of R𝑅Ritalic_R any two factors share at most one attribute, i.e. the bonding diagram has no multiedges.

(ii) If R𝑅Ritalic_R does not have an unary Cartesian factor then its minimal reductions have no unaries at all.

(iii) If R𝑅Ritalic_R does not have a binary Cartesian factor then there exist its minimal reductions with no binaries at all.

a) Refer to caption     b) Refer to caption     c) Refer to caption

Figure 4: Schematic bond diagrams, the dots stand for predicate vertices: a) subcubic multiedges; b) merging unaries; c) merging binaries.
Proof.

(i) Possible multiedge configurations are shown on Figure 4 a). Two of them have no loose ends and cannot occur because minimal reductions are non-redundant. In the other two merging would eliminate a ternary, so they cannot occur by minimality.

(ii) If the unary shares its attribute with another factor there are three cases. It is another unary and they form a redundant component, which is ruled out. It is a ternary and merging will turn it into a binary, which is also ruled out. Finally, if it is a binary then merging will reduce it to an unary, and we can repeat the process, Figure 4 b). By induction on the number of binaries, it must stop, and it can only stop when the unary’s attribute is free (we hit a loose end). But then R𝑅Ritalic_R has an unary Cartesian factor, contrary to the assumption.

(iii) By (i), a binary can share at most one attribute with another factor, and when it does, merging reduces the number of binaries, Figure 4 c). By induction, all binaries can be eliminated except for those with both attributes free, i.e. binary Cartesian factors. ∎

After applying the Lemma’s procedure, the only reducible factors left, if any, are degenerate binary Cartesian factors. Factoring them into pairs of unary factors produces a complete minimal reduction.

9 Ternarity and Peirce’s reduction thesis

In this section we will use ternarity and graph theory to refine Peirce’s reduction thesis on infinite domains, and show that its strengthened form fails dramatically on finite domains.

Definition 11.

The bond graph is obtained from the bond diagram by placing additional vertices at the loose ends and removing the labels.

The bond graph is just a multigraph of graph theory [4]. If the bond was subternaric then the multigraph will be subcubic (subtrivalent), i.e. have vertices of degree at most 3333. Such graphs are widely studied, particularly due to applications in structural chemistry and knot theory. By Lemma 3, when the bond is a minimal reduction of a relation without unary Cartesian factors, the bond graph is a simple graph and its vertices of degree 1111, called pendants in graph theory, are in 1111-1111 correspondence with the relation’s attributes.

To prove the next lemma, we will need a graph-theoretic formula originally due to Listing [26]. Let V𝑉Vitalic_V be the number of vertices, E𝐸Eitalic_E the number of edges, C𝐶Citalic_C the number of fundamental cycles, and K𝐾Kitalic_K the number of connected components, then VE+CK=0𝑉𝐸𝐶𝐾0V-E+C-K=0italic_V - italic_E + italic_C - italic_K = 0. This formula is often used in modern graph theory as the definition of C𝐶Citalic_C, called the cyclomatic number, which is then proved to be equal to the number of fundamental cycles [4].

Lemma 4.

If I,II,III𝐼𝐼𝐼𝐼𝐼𝐼I,I\!I,I\!I\!Iitalic_I , italic_I italic_I , italic_I italic_I italic_I denote the numbers of vertices of degrees 1,2,31231,2,31 , 2 , 3, respectively, in a subcubic multigraph then IIII=2(CK)𝐼𝐼𝐼𝐼2𝐶𝐾I\!I\!I-I=2(C-K)italic_I italic_I italic_I - italic_I = 2 ( italic_C - italic_K ). In particular, I𝐼Iitalic_I and III𝐼𝐼𝐼I\!I\!Iitalic_I italic_I italic_I have the same parity. If the graph is connected with at least n𝑛nitalic_n pendants then IIIn2𝐼𝐼𝐼𝑛2I\!I\!I\geq n-2italic_I italic_I italic_I ≥ italic_n - 2.

Proof.

In a subcubic multigraph we have I+II+III=V𝐼𝐼𝐼𝐼𝐼𝐼𝑉I+I\!I+I\!I\!I=Vitalic_I + italic_I italic_I + italic_I italic_I italic_I = italic_V. And, by the handshaking theorem, the sum of all vertex valencies is twice the number of its edges, so I+2II+3III=2E𝐼2𝐼𝐼3𝐼𝐼𝐼2𝐸I+2I\!I+3I\!I\!I=2Eitalic_I + 2 italic_I italic_I + 3 italic_I italic_I italic_I = 2 italic_E. Therefore, VE=12(IIII)𝑉𝐸12𝐼𝐼𝐼𝐼V-E=\frac{1}{2}(I-I\!I\!I)italic_V - italic_E = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_I - italic_I italic_I italic_I ). It remains to multiply both sides by 2222 and note that VE=(CK)𝑉𝐸𝐶𝐾V-E=-(C-K)italic_V - italic_E = - ( italic_C - italic_K ) by the Listing’s formula. In a connected multigraph K=1𝐾1K=1italic_K = 1, so III=I2+2Cn2𝐼𝐼𝐼𝐼22𝐶𝑛2I\!I\!I=I-2+2C\geq n-2italic_I italic_I italic_I = italic_I - 2 + 2 italic_C ≥ italic_n - 2. ∎

It is instructive to give a direct intuitive argument for the case n=3𝑛3n=3italic_n = 3. Since the multigraph is connected there exist paths going from two of the pendants to the third. They must meet at some vertex, and then proceed jointly to the destination (of course, they may meet and diverge several times). That meeting vertex must have degree 3333.

Corollary 4.

If R𝑅Ritalic_R does not have an unary Cartesian factor then its ternarity and arity have the same parity.

Proof.

We have 𝐭𝐞𝐫(R)=III𝐭𝐞𝐫𝑅𝐼𝐼𝐼\mathrm{\mathbf{ter\,}}(R)=I\!I\!Ibold_ter ( italic_R ) = italic_I italic_I italic_I in the bond graph of a minimal bond reduction of R𝑅Ritalic_R. By Lemma 3 (ii), it contains no unaries, so the only pendants come from loose ends, and I𝐼Iitalic_I is the arity of R𝑅Ritalic_R. The result now follows directly from Lemma 4. ∎

The no-unary-factor condition cannot be dropped, the quaternary P(u)Q(x,y,z)𝑃𝑢𝑄𝑥𝑦𝑧P(u)\land Q(x,y,z)italic_P ( italic_u ) ∧ italic_Q ( italic_x , italic_y , italic_z ) with non-degenerate Q𝑄Qitalic_Q has arity 4444 but ternarity 3333.

Peirce’s reduction thesis is essentially equivalent to 1𝐭𝐞𝐫(R)<1𝐭𝐞𝐫𝑅1\leq\mathrm{\mathbf{ter\,}}(R)<\infty1 ≤ bold_ter ( italic_R ) < ∞ for non-degenerate R𝑅Ritalic_R with n3𝑛3n\geq 3italic_n ≥ 3. The next theorem gives us the exact number, on infinite domains, and the promised irreducibility of non-degenerate ternaries, on any domains.

Theorem 16.

Ternarity of any non-degenerate n𝑛nitalic_n-ary R𝑅Ritalic_R with n2𝑛2n\geq 2italic_n ≥ 2 satisfies 𝐭𝐞𝐫(R)n2𝐭𝐞𝐫𝑅𝑛2\mathrm{\mathbf{ter\,}}(R)\geq n-2bold_ter ( italic_R ) ≥ italic_n - 2. In particular, any non-degenerate ternary is bond irreducible. If, moreover, |R||𝒟|𝑅𝒟|R|\leq|\mathcal{D}|| italic_R | ≤ | caligraphic_D | then 𝐭𝐞𝐫(R)=n2𝐭𝐞𝐫𝑅𝑛2\mathrm{\mathbf{ter\,}}(R)=n-2bold_ter ( italic_R ) = italic_n - 2. In particular, 𝐭𝐞𝐫(R)=n2𝐭𝐞𝐫𝑅𝑛2\mathrm{\mathbf{ter\,}}(R)=n-2bold_ter ( italic_R ) = italic_n - 2 for all relations on infinite domains.

Proof.

Consider a bond decomposition of R𝑅Ritalic_R. Since R𝑅Ritalic_R is non-degenerate its bond multigraph is connected, and has at least n𝑛nitalic_n pendants coming from the loose ends, the free variables. Therefore, In𝐼𝑛I\geq nitalic_I ≥ italic_n. The lower bound 𝐭𝐞𝐫(R)n2𝐭𝐞𝐫𝑅𝑛2\mathrm{\mathbf{ter\,}}(R)\geq n-2bold_ter ( italic_R ) ≥ italic_n - 2 now follows directly from Lemma 4. For n=3𝑛3n=3italic_n = 3 this means that any bond decomposition of a non-degenerate ternary must contain a ternary, i.e. such ternaries are bond irreducible.

If |R||𝒟|𝑅𝒟|R|\leq|\mathcal{D}|| italic_R | ≤ | caligraphic_D | then, by Lemma 1, it admits a 1111-key, and, by Theorem 15, 𝐭𝐞𝐫(R)n2𝐭𝐞𝐫𝑅𝑛2\mathrm{\mathbf{ter\,}}(R)\leq n-2bold_ter ( italic_R ) ≤ italic_n - 2, hence 𝐭𝐞𝐫(R)=n2𝐭𝐞𝐫𝑅𝑛2\mathrm{\mathbf{ter\,}}(R)=n-2bold_ter ( italic_R ) = italic_n - 2. On infinite domains |R||𝒟|𝑅𝒟|R|\leq|\mathcal{D}|| italic_R | ≤ | caligraphic_D | holds for all relations. ∎

The next example shows that the inequality in the lower bound on 𝐭𝐞𝐫𝐭𝐞𝐫\mathrm{\mathbf{ter\,}}bold_ter can be strict on finite domains.

Example 9 (Herzberger’s quaternary).

Consider the quaternary H𝐻Hitalic_H on 𝒟={α,β,γ}𝒟𝛼𝛽𝛾\mathcal{D}=\{\alpha,\beta,\gamma\}caligraphic_D = { italic_α , italic_β , italic_γ } introduced by Herzberger in [17] and given by the table below.

H𝐻Hitalic_H := α𝛼\alphaitalic_α β𝛽\betaitalic_β β𝛽\betaitalic_β α𝛼\alphaitalic_α β𝛽\betaitalic_β α𝛼\alphaitalic_α α𝛼\alphaitalic_α β𝛽\betaitalic_β γ𝛾\gammaitalic_γ β𝛽\betaitalic_β γ𝛾\gammaitalic_γ β𝛽\betaitalic_β β𝛽\betaitalic_β γ𝛾\gammaitalic_γ β𝛽\betaitalic_β γ𝛾\gammaitalic_γ      H1,2,3superscript𝐻123H^{1,2,3}italic_H start_POSTSUPERSCRIPT 1 , 2 , 3 end_POSTSUPERSCRIPT = α𝛼\alphaitalic_α β𝛽\betaitalic_β β𝛽\betaitalic_β β𝛽\betaitalic_β α𝛼\alphaitalic_α α𝛼\alphaitalic_α γ𝛾\gammaitalic_γ β𝛽\betaitalic_β γ𝛾\gammaitalic_γ β𝛽\betaitalic_β γ𝛾\gammaitalic_γ β𝛽\betaitalic_β   H1,2,4superscript𝐻124H^{1,2,4}italic_H start_POSTSUPERSCRIPT 1 , 2 , 4 end_POSTSUPERSCRIPT = α𝛼\alphaitalic_α β𝛽\betaitalic_β α𝛼\alphaitalic_α β𝛽\betaitalic_β α𝛼\alphaitalic_α β𝛽\betaitalic_β γ𝛾\gammaitalic_γ β𝛽\betaitalic_β β𝛽\betaitalic_β β𝛽\betaitalic_β γ𝛾\gammaitalic_γ γ𝛾\gammaitalic_γ

One can check by cases that H𝐻Hitalic_H is non-degenerate. We have |H|=4>3=|𝒟|𝐻43𝒟|H|=4>3=|\mathcal{D}|| italic_H | = 4 > 3 = | caligraphic_D |, and Herzberger shows by combinatorial search that H𝐻Hitalic_H is, indeed, not a relative product of two ternaries (actually, he only shows that for one partition of attributes, but the argument works analogously for others). Therefore, 𝐭𝐞𝐫(H)>2𝐭𝐞𝐫𝐻2\mathrm{\mathbf{ter\,}}(H)>2bold_ter ( italic_H ) > 2.

However, H𝐻Hitalic_H is not a counterexample to reducibility. One can see by inspection that it has a 2222-key (in fact, any two of its columns are a 2222-key). Therefore, by Theorem 5, it is a join of two ternaries, e.g. H=H1,2,3H1,2,4𝐻superscript𝐻123joinsuperscript𝐻124H=H^{1,2,3}\Join H^{1,2,4}italic_H = italic_H start_POSTSUPERSCRIPT 1 , 2 , 3 end_POSTSUPERSCRIPT ⨝ italic_H start_POSTSUPERSCRIPT 1 , 2 , 4 end_POSTSUPERSCRIPT if we pick the first two columns as the 2222-key. Bond explication (20) of the two shared attributes converts this join into a bond of four ternaries, its projections H1,2,3superscript𝐻123H^{1,2,3}italic_H start_POSTSUPERSCRIPT 1 , 2 , 3 end_POSTSUPERSCRIPT, H1,2,4superscript𝐻124H^{1,2,4}italic_H start_POSTSUPERSCRIPT 1 , 2 , 4 end_POSTSUPERSCRIPT, and two teridentities. Since 𝐭𝐞𝐫(H)𝐭𝐞𝐫𝐻\mathrm{\mathbf{ter\,}}(H)bold_ter ( italic_H ) must be even by Lemma 4 and 𝐭𝐞𝐫(H)>2𝐭𝐞𝐫𝐻2\mathrm{\mathbf{ter\,}}(H)>2bold_ter ( italic_H ) > 2 we conclude that 𝐭𝐞𝐫(H)=4𝐭𝐞𝐫𝐻4\mathrm{\mathbf{ter\,}}(H)=4bold_ter ( italic_H ) = 4.

Herzberger’s observation can be strengthened by relating ternarity to the number of parameters in projoin reductions and applying Theorem 10. It turns out that linear bounds on ternarity in terms of arity, as in Theorem 15, are not typical for general relations. However, we cannot infer existence of relations of infinite ternarity, i.e. of irreducible n𝑛nitalic_n-aries with n4𝑛4n\geq 4italic_n ≥ 4, and hence refute Peirce’s original thesis.

Theorem 17.

The share of n𝑛nitalic_n-ary relations with n4𝑛4n\geq 4italic_n ≥ 4 and 𝐭𝐞𝐫(R)m𝐭𝐞𝐫𝑅𝑚\mathrm{\mathbf{ter\,}}(R)\leq mbold_ter ( italic_R ) ≤ italic_m among all such relations on a domain 𝒟𝒟\mathcal{D}caligraphic_D is <1absent1<1< 1 for |𝒟|>(3m+n2n1)𝒟binomial3𝑚𝑛2𝑛1|\mathcal{D}|>\binom{\frac{3m+n}{2}}{n-1}| caligraphic_D | > ( FRACOP start_ARG divide start_ARG 3 italic_m + italic_n end_ARG start_ARG 2 end_ARG end_ARG start_ARG italic_n - 1 end_ARG ), and asymptotically vanishes when |𝒟|𝒟|\mathcal{D}|\to\infty| caligraphic_D | → ∞. In particular, there exist n𝑛nitalic_n-ary relations of arbitrarily high ternarity.

Proof.

Suppose R𝑅Ritalic_R is non-degenerate with 𝐭𝐞𝐫(R)m𝐭𝐞𝐫𝑅𝑚\mathrm{\mathbf{ter\,}}(R)\leq mbold_ter ( italic_R ) ≤ italic_m. By Lemma 3, there is a purely ternary minimal reduction of it. In its bond graph I=n𝐼𝑛I=nitalic_I = italic_n, III=𝐭𝐞𝐫(R)𝐼𝐼𝐼𝐭𝐞𝐫𝑅I\!I\!I=\mathrm{\mathbf{ter\,}}(R)italic_I italic_I italic_I = bold_ter ( italic_R ), and 2E=I+3III2𝐸𝐼3𝐼𝐼𝐼2E=I+3I\!I\!I2 italic_E = italic_I + 3 italic_I italic_I italic_I by the handshaking theorem. Of the edges, n𝑛nitalic_n are incident to pendants and correspond to free variables, while k:=Enassign𝑘𝐸𝑛k:=E-nitalic_k := italic_E - italic_n correspond to bond parameters. Therefore, our minimal reduction is a projoin reduction with k=3𝐭𝐞𝐫(R)n23mn2𝑘3𝐭𝐞𝐫𝑅𝑛23𝑚𝑛2k=\frac{3\,\mathrm{\mathbf{ter\,}}(R)-n}{2}\leq\frac{3m-n}{2}italic_k = divide start_ARG 3 bold_ter ( italic_R ) - italic_n end_ARG start_ARG 2 end_ARG ≤ divide start_ARG 3 italic_m - italic_n end_ARG start_ARG 2 end_ARG parameters.

By Theorem 10, the share of relations projoin reducible with k𝑘kitalic_k parameters is <1absent1<1< 1 when |𝒟|>(n+kn1)𝒟binomial𝑛𝑘𝑛1|\mathcal{D}|>\binom{n+k}{n-1}| caligraphic_D | > ( FRACOP start_ARG italic_n + italic_k end_ARG start_ARG italic_n - 1 end_ARG ) and it goes to 00 when |𝒟|𝒟|\mathcal{D}|\to\infty| caligraphic_D | → ∞. Since n+k3m+n2𝑛𝑘3𝑚𝑛2n+k\leq\frac{3m+n}{2}italic_n + italic_k ≤ divide start_ARG 3 italic_m + italic_n end_ARG start_ARG 2 end_ARG this inequality is satisfied for our k𝑘kitalic_k. Degenerate relations are projoin reducible with even 00 parameters, let alone k𝑘kitalic_k, so there must be non-degenerate relations with 𝐭𝐞𝐫>m𝐭𝐞𝐫𝑚\mathrm{\mathbf{ter\,}}>mbold_ter > italic_m on our domain. Moreover, their share approaches 1111 when |𝒟|𝒟|\mathcal{D}|\to\infty| caligraphic_D | → ∞. ∎

The estimate we used in the theorem is very rough. Indeed, k𝑘kitalic_k is not the number of parameters in just any projoin reduction, but in a complete reduction down to ternaries. One could merge ternaries, as long as the merged factors still have arity <nabsent𝑛<n< italic_n, and reduce that number. To get a more accurate estimate one can count the number of subcubic graphs with n𝑛nitalic_n pendants, m𝑚mitalic_m cubic and no degree 2222 vertices, and bound the number of relations that have them as their bond graphs.

Finally, Peirce’s reduction thesis on infinite domains and bond explication show that of all ternaries only one is needed in reductions – teridentity. It is to unaries, binaries and teridentities that we should aim to reduce all relations. This suggests our next definition.

Definition 12.

𝑰𝟑subscript𝑰3\boldsymbol{I_{3}}bold_italic_I start_POSTSUBSCRIPT bold_3 end_POSTSUBSCRIPT-ternarity of a relation, denoted 𝐭𝐞𝐫I3subscript𝐭𝐞𝐫subscriptI3\mathrm{\mathbf{ter\,}}_{\!I_{3}}bold_ter start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, is the minimal number of teridentities in its subternaric bond reductions where the only ternaries are teridentities, and \infty if no such reductions exist.

Clearly, 𝐭𝐞𝐫I3𝐭𝐞𝐫subscript𝐭𝐞𝐫subscript𝐼3𝐭𝐞𝐫\mathrm{\mathbf{ter\,}}_{\!I_{3}}\leq\mathrm{\mathbf{ter\,}}bold_ter start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ bold_ter and 𝐭𝐞𝐫I3=𝐭𝐞𝐫subscript𝐭𝐞𝐫subscript𝐼3𝐭𝐞𝐫\mathrm{\mathbf{ter\,}}_{\!I_{3}}=\mathrm{\mathbf{ter\,}}bold_ter start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = bold_ter on infinite domains because any ternary decomposes into binaries and teridentities by hypostatic abstraction. The next example shows that on finite domains, again, the inequality can be strict.

Example 10.

Suppose a non-degenerate ternary has a subternaric decomposition with a single teridentity. Since it has no unary factors unaries can be eliminated, and we are left with the teridentity with up to three chains of binaries attached to it in the graph. A chain of binaries can be merged into a single one by relative products, and represent our ternary as a teridentity directly bonded with three (or fewer) binaries. Turning the teridentity into a branch point we obtain a projoin of binaries with a single parameter.

However, we showed in Example 6 that ¬I3subscript𝐼3\neg I_{3}¬ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT on a domain with |𝒟|=2,3𝒟23|\mathcal{D}|=2,3| caligraphic_D | = 2 , 3 is not projoin reducible with one parameter. Therefore, while 𝐭𝐞𝐫(¬I3)=1𝐭𝐞𝐫subscript𝐼31\mathrm{\mathbf{ter\,}}(\neg I_{3})=1bold_ter ( ¬ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = 1 trivially, 𝐭𝐞𝐫I3(¬I3)>1subscript𝐭𝐞𝐫subscript𝐼3subscript𝐼31\mathrm{\mathbf{ter\,}}_{\!I_{3}}(\neg I_{3})>1bold_ter start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ¬ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) > 1. Since 𝐭𝐞𝐫I3(¬I3)subscript𝐭𝐞𝐫subscript𝐼3subscript𝐼3\mathrm{\mathbf{ter\,}}_{\!I_{3}}(\neg I_{3})bold_ter start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ¬ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) must be odd we can conclude that 𝐭𝐞𝐫I3(¬I3)3subscript𝐭𝐞𝐫subscript𝐼3subscript𝐼33\mathrm{\mathbf{ter\,}}_{\!I_{3}}(\neg I_{3})\geq 3bold_ter start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ¬ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ≥ 3, i.e. it takes at least 3333 teridentities to bond ¬I3subscript𝐼3\neg I_{3}¬ italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT on small domains, if it is possible at all.

10 Conclusions and open problems

We studied reduction of relations to relations of smaller arity under three relational operations: join, projoin and bond. All three can be expressed by conjunctions and existential quantification on predicates, and are motivated by algebraic analogies and practical applications in the database theory. Aside from unifying and extending known reduction results and constructions, we described the sets of irreducible relations and the structure of complete reductions to them. We also clarified the relationship between projoin and bond reducibility, and the import of Peirce’s reduction thesis. Finally, we introduced the notion of ternarity that, intuitively, measures complexity of ‘relating’ in a relation, and used it to sharpen reducibility results.

Aside from concrete results, a major takeaway from this work is the striking gap between reduction behavior on finite and infinite domains. As far as we know, the only author to notice the phenomenon before was Herzberger [17]. We showed that the gap gets wider as the size of the domain grows: the share of irreducible relations with bounded number of parameters (Theorem 10) or with bounded ternarity (Theorem 17), grows with it, even though it is 00 at \infty. The root cause of this discrepancy is the equality |𝒟|=|𝒟|2𝒟superscript𝒟2|\mathcal{D}|=|\mathcal{D}|^{2}| caligraphic_D | = | caligraphic_D | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for infinite cardinalities, which is equivalent to the axiom of choice.

This raises a big question: to what extent does Peirce’s reduction thesis hold on finite domains? While non-degenerate ternaries are still irreducible there (even by stronger means than bonds and projoins [14]), reduction is obstructed by the lack of enough domain elements for classical constructions.

Problem 1: Are there irreducible n𝑛nitalic_n-ary relations with n4𝑛4n\geq 4italic_n ≥ 4?

Such relations would have to have a lot of tuples, |R|>|𝒟|n2𝑅superscript𝒟𝑛2|R|>|\mathcal{D}|^{n-2}| italic_R | > | caligraphic_D | start_POSTSUPERSCRIPT italic_n - 2 end_POSTSUPERSCRIPT. Otherwise, they will have a k𝑘kitalic_k-key with kn2𝑘𝑛2k\leq n-2italic_k ≤ italic_n - 2 and hypostatic abstraction will reduce them (Theorem 9). Counting arguments we used would not settle the question alone, because general projoins and bonds do not have a finite combinatorial description like Cartesian products, joins, projoins with bounded number of parameters, or bonds with bounded ternarity. On the other hand, general tests of irreducibility, like the ones for join irreducibility in Theorem 7, also seem to be elusive. A promising approach is provided by the clone theory, where one can dualize the problem into one about functional clones via the Pol-Inv Galois connection. Projoin bases of small arity for maximal sub-co-clones of the co-clone of all relations on 2222-element domains are constructed in [5]. If one could construct bases containing only unaries, binaries and ternaries for the co-clone of all relations on any finite domain that would resolve the question negatively.

While we suspect a negative answer to the first problem, it is more likely to be affirmative for the next one. Although non-degenerate ternaries are trivially ‘reducible’ to themselves on any domains, there is a non-trivial reducibility question about them. To resolve it negatively, one would need a basis of the co-clone of all relations containing only unaries and binaries.

Problem 2: Are there ternary relations indecomposable into bonds of unaries, binaries and teridentities (equivalently, projoin irreducible to unaries and binaries)?

We already saw that the strongest form of the reduction thesis, that gives ternarity of non-degenerate n𝑛nitalic_n-ary relations as n2𝑛2n-2italic_n - 2, fails on finite domains. Even if reductions are always possible their complexity must be higher than that of their infinite counterparts. In particular, there can be no bound on ternarity in terms of arity alone. However, since there are finitely many n𝑛nitalic_n-aric relations of finite ternarity on a finite domain ternarity must attain a maximum on them.

Problem 3: Find sharp upper bounds on ternarity in terms of arity and the size of the domain.

We did not address the question of uniqueness, but a relation can have purely ternaric minimal reductions whose bond graphs are not even isomorphic. Indeed, bonding teridentities on any cubic graph with n𝑛nitalic_n hanging edges produces n𝑛nitalic_n-identity, and it is easy to construct non-isomorphic tree graphs with equal numbers of cubic vertices. Perhaps, this diversity is due to overabundance of symmetry in Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

Problem 4: Are the bond graphs of purely ternaric minimal reductions unique for ‘generic’ non-degenerate relations?

One can think of minimal reductions as revealing the structure of information processing within a relation, which suggests an affirmative answer. Attributes of a relation generalize inputs and outputs of a function, and functions are commonly interpreted as information processors [23, 1.5.4]. In [29] a measure of information exchange among relation’s attributes is introduced and studied, similar measures are studied in computational biology [36]. The intuition of ternarity as ‘complexity of relating attributes’ suggests a connection.

Problem 5: Is there an information-theoretic interpretation of ternarity, e.g. bounds on measures of information exchange in terms of it?

To summarize, logical factorization of relations poses many interesting challenges at the intersection of mathematical logic, combinatorics, graph theory and data science.

References

  • [1] S. Abiteboul, R. Hull, V. Vianu, Foundations of databases, Addison-Wesley, New York, 1995.
  • [2] A. Aho, C. Beeri, J. Ullman, Theory of joins in relational databases, ACM Transactions on Database Systems, 4 (1979) no. 3, 297-314.
  • [3] C. Beeri, R. Fagin, D. Maier, A. Mendelzon, J. Ullman, Properties of acyclic database schemes, in Proceedings of the 13th annual ACM Symposium on Theory of Computing, Milwaukee, May 11-13, 1981, 355-362.
  • [4] C. Berge, The theory of graphs, Dover, Mineola, NY, 2001.
  • [5] E. Böhler, S. Reith, H. Schnoor, H. Vollmer, Bases for Boolean co-clones, Information Processing Letters, 96 (2005) no. 2, 59-66.
  • [6] F. Börner, Basics of Galois connections, in Complexity of Constraints. Lecture Notes in Computer Science, v. 5250, Springer-Verlag, Berlin, 2008, 38-67.
  • [7] J. Brunning, C. S. Peirce’s relative product, Modern Logic, 2 (1991) no. 1, 33-49.
  • [8] R. Burch, A Peircean reduction thesis: the foundations of topological logic, Texas Tech University Press, Lubbock, TX, 1991.
  • [9] R. Burch, Peirce’s reduction thesis, in Studies in the Logic of Charles Sanders Peirce, ch. 16, Indiana University Press, 1997, 234-252.
  • [10] J. van den Bussche, Applications of Alfred Tarski’s ideas in database theory, in Computer Science Logic. Lecture Notes in Computer Science, v. 2142, Springer, Berlin, 2001, 20-37.
  • [11] T. Cao, Conceptual graphs and fuzzy Logic, Springer, Berlin, 2010.
  • [12] E. Codd, A relational model for large shared data banks, Communications of the ACM, 13 (1979) no. 6, 377-387.
  • [13] J. Hereth Correia, F. Dau, Two instances of Peirce’s reduction thesis, in Formal Concept Analysis. Lecture Notes in Computer Science, v. 3874, Springer, Berlin, 2006, 106-118.
  • [14] J. Hereth Correia, R. Pöschel, The power of Peircean Algebraic Logic (PAL), in Concept Lattices, Second International Conference on Formal Concept Analysis, Springer, Berlin, 2004, 337-351.
  • [15] I. Düntsch, Sz. Mikulas, Cylindric structures and dependencies in relational databases, Theoretical Computer Science, 269 (2001) no. 1-2, 451-468.
  • [16] R. Fagin, Multivalued dependencies and a new normal form for relational databases, ACM Transactions on Database Systems, 2 (1977) no. 3, 262-278.
  • [17] H. Herzberger, Peirce’s remarkable theorem, in Pragmatism and Purpose: Essays Presented to Thomas A. Goudge, University of Toronto Press, 1981, 41-58.
  • [18] T. Imielinski, W. Lipski, The relational model of data and cylindric algebras, Journal of Computer and System Sciences, 28 (1984) no. 1, 80-102.
  • [19] T. Jech, The axiom of choice, Elsevier, New York, 1973.
  • [20] P. Jonsson, V. Lagerkvist, G. Nordh, B. Zanuttini, Complexity of SAT problems, clone theory and the exponential time hypothesis, in Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, 2013, 1264-1277.
  • [21] S. Koshkin, Is Peirce’s reduction thesis gerrymandered? Transactions of the Charles S. Peirce Society, 58 (2022) no. 4, 271-300.
  • [22] V. Lagerkvist, M. Wahlström, The power of primitive positive definitions with polynomially many variables, Journal of Logic and Computation, 27 (2017) no. 5, 1465-1488.
  • [23] D. Lau, Function algebras on finite sets, Springer-Verlag, Berlin, 2006.
  • [24] T. Lee, An algebraic theory of relational databases, Bell System Technical Journal, 62 (1983) no. 10, 3159-3204.
  • [25] S. Lipschutz, Set theory and related topics, McGraw Hill, New York, 1998.
  • [26] J. Listing, Der Census räumlicher Complexe, der Verallgemeinerung des Euler’schen Satzes von den Polyedern, Abhandlungen der Königlichen Gesellschaft der Wissenschaften in Göttingen, 10 (1862) 97-182.
  • [27] L. Löwenheim, Über Möglichkeiten im Relativkalkül, Mathematische Annalen 76 (1915) 447-470. English translation: On possibilities in the calculus of relatives, in From Frege to Gödel: a source book in mathematical logic 1879-1931, Harvard University Press, Cambridge, MS, 1967, 228-251.
  • [28] D. Maier, The theory of relational databases, Computer Science Press, Rockville, MD, 1983.
  • [29] F. Malvestuto, Statistical treatment of the information content of a database, Information Systems, 11 (1986) no. 3, 211-223.
  • [30] A. Mendelzon, D. Maier, Generalized mutual dependencies and the decomposition of database relations, in Fifth International Conference on Very Large Data Bases, IEEE, 1979, 75-82.
  • [31] P. Miettinen, Boolean tensor factorizations, in 11th International Conference on Data Mining, IEEE, New York, 2011, 447-456.
  • [32] P. Miettinen, Recent developments in Boolean matrix factorization, in Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, 2021, 4922-4928.
  • [33] D. Pöschel, L. Kaluz̆nin, Funktionen- und Relationenalgebren, Mathematische Monographien 15, Deutscher Verlag der Wissenschaften, Berlin, 1979.
  • [34] J. Rissanen, Independent components of relations, ACM Transactions on Database Systems, 2 (1977) no. 4, 317-325.
  • [35] T. Schaefer, The complexity of satisfiability problems, in Conference Record of the 10th Annual ACM Symposium on Theory of Computing, San Diego, 1978, 216-226.
  • [36] M. Tegmark, Improved measures of integrated information, PLoS Computational Biology, 12(11) (2016) e1005123.
  • [37] S. Tringali, An abstract factorization theorem and some applications, Journal of Algebra, 602 (2022) 352-380.
  • [38] M. Yannakakis, C. Papadimitriou, Algebraic dependencies, Journal of Computer and System Sciences, 25 (1982) no.1, 2-41.