Relational Probabilistic Conditionals and Their Instantiations under Maximum Entropy Semantics for First-Order Knowledge Bases

Beierle, Christoph; Finthammer, Marc; Kern-Isberner, Gabriele

doi:10.3390/e17020852

Open AccessArticle

Relational Probabilistic Conditionals and Their Instantiations under Maximum Entropy Semantics for First-Order Knowledge Bases

by

Christoph Beierle

^1,*,

Marc Finthammer

¹ and

Gabriele Kern-Isberner

²

¹

Faculty of Mathematics and Computer Science, University of Hagen, 58084 Hagen, Germany

²

Department of Computer Science, University of Technology Dortmund, 44227 Dortmund, Germany

^*

Author to whom correspondence should be addressed.

Entropy 2015, 17(2), 852-865; https://doi.org/10.3390/e17020852

Submission received: 26 December 2014 / Revised: 29 January 2015 / Accepted: 9 February 2015 / Published: 13 February 2015

(This article belongs to the Special Issue Maximum Entropy Applied to Inductive Logic and Reasoning)

Download

Browse Figure

Versions Notes

Abstract

:

For conditional probabilistic knowledge bases with conditionals based on propositional logic, the principle of maximum entropy (ME) is well-established, determining a unique model inductively completing the explicitly given knowledge. On the other hand, there is no general agreement on how to extend the ME principle to relational conditionals containing free variables. In this paper, we focus on two approaches to ME semantics that have been developed for first-order knowledge bases: aggregating semantics and a grounding semantics. Since they use different variants of conditionals, we define the logic PCI, which covers both approaches as special cases and provides a framework where the effects of both approaches can be studied in detail. While the ME models under PCI-grounding and PCI-aggregating semantics are different in general, we point out that parametric uniformity of a knowledge base ensures that both semantics coincide. Using some concrete knowledge bases, we illustrate the differences and common features of both approaches, looking in particular at the ground instances of the given conditionals.

Keywords:

conditional logic; probabilistic logic; maximum entropy; relational conditional; first-order knowledge base; instantiation restriction; grounding semantics; aggregating semantics; parametric uniformity; maximum entropy model

1. Introduction

Probabilistic conditional knowledge bases containing conditionals of the form (B|A)[d] with the reading “if A, then B with probability d” are a powerful means for knowledge representation and reasoning when uncertainty is involved [1,2]. If A and B are propositional formulas over a propositional alphabet Σ, possible worlds correspond to elementary conjunctions over Σ, where an elementary conjunction is a conjunction containing every element of Σ exactly once, either in non-negated or in negated form. A possible worlds semantics is given by probability distributions over the set of possible worlds, and a probability distribution P satisfies (B|A)[d] if for the conditional probability

P (B | A) = \frac{P (A \land B)}{P (A)}

the relation P (B|A) = d holds. For a knowledge base

ℛ

consisting of a set of propositional conditionals, P is a model of

ℛ

if P satisfies each conditional in

ℛ

. The principle of maximum entropy (ME principle) is a well-established concept for choosing the uniquely determined model of

ℛ

having maximum entropy. This model is the most unbiased model of

ℛ

in the sense that it completes the knowledge given by

ℛ

inductively but adds as little additional information as possible [3–9].

While for a set of propositional conditionals there is a general agreement about its ME model, the situation changes when the conditionals are built over a relational first-order language. As an illustration, consider the following example.

Example 1 (Elephant Keeper). The elephant keeper example, adapted from [10,11], models the relationships among elephants in a zoo and their keepers. Elephants usually like their keepers, except for keeper Fred. However, elephant Clyde gets along with everyone, and therefore he also likes Fred. The knowledge base

ℛ

_EK consists of the following conditionals:

\begin{array}{l} e k_{1} : (l i k e s (E, K) | e l e p h a n t (E), k e e p e r (K)) [0.9] \\ e k_{2} : (l i k e s (E, f r e d) | e l e p k a n t (E), k e e p e r (f r e d)) [0.05] \\ e k_{3} : (l i k e s (c l y d e, f r e d) | e l e p h a n t (c l y d e), k e e p e r (f r e d) [0.85] \end{array}

Conditional ek₁ models statistical knowledge about the general relationship between elephants and their keepers, whereas conditional ek₂ represents knowledge about the exceptional keeper Fred and his relationship to elephants in general. Conditional ek₃ models the subjective belief about the relationship between the elephant Clyde and keeper Fred. From a common sense point of view, the knowledge base

ℛ

_EK makes perfect sense: conditional ek₂ is an exception from ek₁, and ek₃ is an exception from ek₂.

When trying to extend the ME principle from the propositional case to such a relational setting, a central question is how to interpret the free variables occurring in a conditional. For instance, note that a straightforward complete grounding of

ℛ

_EK yields a grounded knowledge base that can be viewed as a propositional knowledge base. However, this grounded knowledge is inconsistent since it contains both (likes(clyde, fred)|elephant(clyde), keeper(fred))[0.9] and (likes(clyde, fred)|elephant(clyde), keeper(fred))[0.05], and no probability distribution P can satisfy both P (likes(clyde, fred)|elephant(clyde), keeper(fred)) = 0.9 and P (likes(clyde, fred)|elephant(clyde), keeper(fred)) = 0.05.

Thus, when extending the ME principle to the relational case with free variables as in

ℛ

_EK, the exact role of the variables has to be specified. There are various approaches dealing with a combination of probabilities with a first-order language (e.g., [12,13]); a comparison and evaluation of some approaches is given [14]). In the following, we focus on two semantics that both employ the principle of maximum entropy for probabilistic relational conditionals, the aggregation semantics [15] proposed by Kern-Isberner and the logic FO-PCL [16] elaborated by Fisseler. While both approaches are related in the sense that they refer to a set of constants when interpreting the variables in the conditionals, there is also a major difference. FO-PCL requires all groundings of a conditional to have the same probability d given in the conditional, and in general, FO-PCL needs to restrict the possible instantiations for the variables occurring in a conditional by providing constraint formulas like U ≠ V or U ≠ a in order to avoid inconsistencies. On the other hand, under aggregation semantics the grounded instances may have distinct probabilities as long as they aggregate to the given probability d, and aggregation semantics is defined only for conditionals without constraint formulas.

In this paper, a logical framework PCI extending aggregation semantics to conditionals with instantiation restrictions and also providing a grounding semantics is proposed. From a knowledge representation point of view, this provides greater flexibility, e.g., when expressing knowledge about individuals known to be exceptional with respect to some relationship. We show that both the aggregation semantics of [15] and the semantics of FO-PCL [16] come out as special cases of PCI, thereby also helping to clarify the relationship between the two approaches. Moreover, we investigate the ME models under PCI-grounding and PCI-aggregating semantics, which are different in general, and we give a condition on knowledge bases ensuring that both ME semantics coincide.

This paper is a revised and extended version of [17] and is organized as follows. In Section 2, we very briefly recall the background of FO-PCL and aggregation semantics. In Section 3, the logic framework PCI is developed and two alternative satisfaction relations for grounding and aggregating semantics are defined for PCI by extending the corresponding notions of [15,16]. In Section 4, the maximum entropy principle is employed with respect to these satisfaction relations; we show that the resulting semantics coincide for knowledge bases that are parametrically uniform [11,16]. In Section 5, we present and discuss ME distributions for some concrete knowledge bases both under PCI-grounding and PCI-aggregating semantics, and point out their differences and common features, covering in particular the groundings of the given conditionals. Finally, in Section 6 we conclude and point out further work.

2. Background: FO-PCL and Aggregation Semantics

As already pointed out in Section 1, simply grounding a relational knowledge base

ℛ

easily leads to inconsistency. Therefore, the logic FO-PCL [11,16] employs instantiation restrictions for the free variables of a conditional. An FO-PCL conditional has additionally a constraint formula determining the admissible instantiations of free variables, and the grounding semantics of FO-PCL requires that all admissible ground instances of a conditional c must have the probability given by c.

Example 2 (Elephant Keeper with instantiation restrictions). In FO-PCL, adding K ≠ fred to conditional ek₁ and E ≠ clyde to conditional ek₂ in

ℛ

_EK yields the knowledge base

ℛ

′_EK with:

\begin{array}{l} e {k^{'}}_{1} : 〈 (l i k e s (E, K) | e l e p h a n t (E), k e e p e r (K)) [0.9], K \neq f r e d 〉 \\ e {k^{'}}_{2} : 〈 (l i k e s (E, f r e d) | e l e p h a n t (E), k e e p e r (f r e d)) [0.05], E \neq c l y d e 〉 \\ e {k^{'}}_{3} : 〈 (l i k e s (c l y d e, f r e d) | e l e p h a n t (c l y d e), k e e p e r (f r e d)) [0.85], ⊤ 〉 \end{array}

Note that, e.g., the ground instance (likes(clyde, fred)|elephant(clyde), keeper(fred))[0.05] of conditional ek′₂ is not admissible, and that the set of admissible ground instances of

ℛ

′_EK is indeed consistent under probabilistic semantics for propositional knowledge bases as considered, e.g., in [8,18].

Thus, under FO-PCL semantics,

ℛ

′_EK is consistent, where a probability distribution P satisfies an FO-PCL conditional r, denoted by

P | =_{fopcl} r

, iff all admissible ground instances of r have the probability specified by r.

In contrast, the aggregation semantics, as given in [15], does not consider instantiation restrictions, since its satisfaction relation (in this paper denoted by

| =_{⊙}^{n o - i r}

to indicate no instantiation restriction), is less strict with respect to probabilities of ground instances:

P | =_{⊙}^{no-ir}

(B|A)[d] iff the quotient of the sum of all probabilities P (B_i ∧ A_i) and the sum of P (A_i) is d, where (B₁|A₁),…, (B_n|A_n) are the ground instances of (B|A). In this way, the aggregation semantics is capable of balancing the probabilities of ground instances, resulting in greater flexibility and higher tolerance with respect to consistency issues. Provided that there are enough individuals so that the corresponding aggregation over all probabilities is possible, the knowledge base

ℛ

_EK that is inconsistent under FO-PCL semantics is consistent under aggregation semantics.

3. PCI Logic

The logical framework PCI (probabilistic conditionals with instantiation restrictions) uses probabilistic conditionals with and without instantiation restrictions and provides different options for a satisfaction relation. The syntax of PCI given in [19] uses the syntax of FO-PCL [11,16]. In the following, we will precisely state the formal relationship among

| =_{⊙}^{no-ir}

,

| =_{fopcl}

, and the satisfaction relations offered by PCI.

As FO-PCL, PCI uses function-free, sorted signatures of the form Σ = (

S

, mathvariant='script'

D

, Pred). In a PCI-signature Σ = (

S

,

D

, Pred),

S

= {s₁,…, s_k} is a set of sort names or just sorts. The set

D

is a finite set of constants symbols where each d ∈

D

has a unique sort s ∈

S

. With

D^{(s)}

we denote the set of all constants having sort s; thus

D = \cup_{s \in S} D^{(s)}

is a set being the union of (disjoint) sets of sorted constant symbols. Pred is a set of predicate symbols, each having a particular number of arguments. If p ∈ Pred is a predicate taking n arguments, each argument position i must be filled with a constant or variable of a specific sort s_i. Thus, each p ∈ Pred comes with an arity of the form s₁ ×…× s_n ∈

S^{n}

indicating the required sorts for the arguments. Variables

V

also have a unique sort, and all formulas and variable substitutions must obey the obvious sort restrictions. In the following, we will adopt the unique names assumption, i.e., different constants denote different elements. The set of all terms is defined as

T e r m_{\sum} : = V \cup D

. Let

ℒ

_Σ be the set of quantifier-free first-order formulas defined over Σ and

V

in the usual way.

Definition 1 (Instantiation Restriction). An instantiation restriction is a conjunction of inequality atoms of the form t₁ ≠ t₂ with t₁, t₂ ∈ Term_Σ. The set of all instantiation restriction is denoted by C_Σ.

Since an instantiation restriction may be a conjunction of inequality atoms, we can express that a conditional has multiple restrictions, e.g., by stating E ≠ clyde ∧ K ≠ fred.

Definition 2 (q-, p-, r-Conditional). Let A, B ∈

ℒ

_Σ be quantifier-free first-order formulas over Σ and

V

.

(B|A) is called a qualitative conditional (or just q-conditional). Note that A is the antecedence and B the consequence of the qualitative conditional. The set of all qualitative conditionals over $ℒ$ _Σ is denoted by ( $ℒ$ _Σ| $ℒ$ _Σ).
Let (B|A) ∈ ( $ℒ$ _Σ| $ℒ$ _Σ) be a qualitative conditional and let d ∈ [0, 1] be a real value. Here (B|A)[d] is called a probabilistic conditional (or just p-conditional) with probability d. The set of all probabilistic conditionals over $ℒ$ _Σ is denoted by ( $ℒ$ _Σ| $ℒ$ _Σ)^prob.
Let (B|A)[d] ∈ ( $ℒ$ _Σ| $ℒ$ _Σ)^prob be a probabilistic conditional and let C ∈ C_Σ be an instantiation restriction. In addition, 〈(B|A)[d], C〉 is called an instantiation restricted conditional (or just r-conditional). The set of all instantiation restricted conditionals over $ℒ$ _Σ is denoted by ${(ℒ_{Σ} | ℒ_{Σ})}_{C_{Σ}}^{p r o b}$ .

Instantiation restricted qualitative conditionals are defined analogously. If it is clear from the context, we may omit qualitative, probabilistic, and instantiation restricted and just use the term conditional.

Definition 3 (PCI knowledge base). A pair (Σ,

ℛ

) consisting of a PCI signature

Σ = (S, D, Pred)

and a set of instantiation restricted conditionals

ℛ

= {r₁,…, r_m} with r_i ∈

{(ℒ_{Σ} | ℒ_{Σ})}_{C_{Σ}}^{p r o b}

is called a PCI knowledge base.

For an instantiation restricted conditional r = 〈(B|A)[d], Ci, Θ_Σ(r) denotes the set of all ground substitutions with respect to the variables in r. A ground substitution θ ∈ Θ_Σ(r) is applied to the formulas A, B and C in the usual way, i.e., each variable is replaced by a certain constant according to the mapping θ = {v₁/c₁,…, v_l/c_l} with v_i ∈

V

, c_i ∈

D

, 1 ≤ i ≤ l. Therefore, θ(A), θ(B), and θ(C) are ground formulas and we have θ((B|A)) := (θ(B)|θ(A)).

Given a ground substitution θ over the variables occurring in an instantiation restriction C ∈

C_{Σ}

, the evaluation of C under θ, denoted by

{〚 C 〛}_{^{θ}}

, yields true iff θ(t₁) and θ(t₂) are different constants for all t₁ ≠ t₂ ∈ C.

Definition 4 (Admissible Ground Substitutions and Instances). Let Σ = (

S

,

D

, Pred) be a many-sorted signature and let r = 〈(B|A)[d], C〉 ∈

{(ℒ_{Σ} | ℒ_{Σ})}_{C_{Σ}}^{p r o b}

be an instantiation restricted conditional. The set of admissible ground substitutions of r is defined as

Θ_{Σ}^{a d m} (r) : = {θ \in Θ_{Σ} (r) | {〚 C 〛}_{θ} = t r u e}

The set of admissible ground instances of r is defined as

g n d_{Σ} (r) : = {θ (B | A) [d] | θ \in Θ_{Σ}^{a d m} (r)}

In the following, when we talk about the ground instances of a conditional, we will always refer to its admissible ground instances.

As for an FO-PCL knowledge base [11], for a PCI knowledge base (Σ,

ℛ

) we define the Herbrand base

ℋ

(

ℛ

) as the set of all ground atoms in all gnd_Σ(r_i) with r_i ∈

ℛ

. Every subset ω ⊆

ℋ

(

ℛ

) is a Herbrand interpretation, defining a logical semantics for

ℛ

. The set Ω_Σ := {ω | ω ⊆

ℋ

(

ℛ

)} denotes the set of all Herbrand interpretations. Herbrand interpretations are also called possible worlds.

Definition 5 (PCI Interpretation). The probabilistic semantics of (Σ,

ℛ

) is a possible worlds semantics [12] where the ground atoms in

ℋ

(

ℛ

) are binary random variables. A PCI interpretation P of a knowledge base (Σ,

ℛ

) is thus a probability distribution P : Ω_Σ → [0, 1]. The set of all probability distributions over Ω_Σ is denoted by

P_{Ω_{Σ}}

or just by

P_{Ω}

.

The PCI framework offers two different satisfaction relations:

| =_{△}^{pci}

is based on grounding as in FO-PCL, and

| =_{⊛}^{pci}

extends aggregation semantics to r-conditionals.

Definition 6 (PCI Satisfaction Relations). Let P ∈

P_{Ω}

and let 〈(B|A)[d], C〉 ∈

{(ℒ_{Σ} | ℒ_{Σ})}_{C_{Σ}}^{p r o b}

be an r-conditional with

\sum_{θ \in Θ \underset{Σ}{a d m (〈 (B | A) [d]}, C 〉)} P (θ (A)) > 0

. The two PCI satisfaction relations

| =_{△}^{pci}

and

| =_{⊛}^{pci}

are defined by:

P | =_{△}^{pci} 〈 (B | A) [d], C 〉 i f f \frac{P (θ (A \land B))}{P (θ (A))} = d \begin{array}{l} f o r a l l \\ θ \in Θ_{Σ}^{a d m} (〈 (B | A) [d], C 〉) \end{array}

(1)

P | =_{⊛}^{pci} 〈 (B | A [d], C 〉 i f f \frac{\sum_{θ \in Θ_{Σ}^{a d m} (〈 (B | A) [d], C 〉)} P (θ (A \land B))}{\sum_{θ \in Θ_{Σ}^{a d m} (〈 (B | A) [d], C 〉)} P (θ (A))} = d

(2)

We say that P satisfies 〈(B|A)[d], C〉 under PCI-grounding semantics

i f f P | =_{△}^{pci} 〈 (B | A) [d], C 〉

. Correspondingly, P satisfies 〈(B|A)[d], C〉 under PCI-aggregation semantics

i f f P | =_{⊛}^{pci} 〈 (B | A) [d], C 〉

.

As usual, the satisfaction relations

| =_{★}^{pci}

with ★ ∈ {△, ⊛} are extended to a set of conditionals

ℛ

by defining

P | =_{★}^{pci} ℛ i f f P | =_{★}^{pci} r for all r \in ℛ .

The following proposition states that PCI properly captures both the instantiation-based semantics |=_fopcl of FO-PCL [11] and the aggregation semantics

| =_{⊙}^{no-ir}

of [15] (cf. Section 2).

Proposition 1 (PCI captures FO-PCL and aggregation semantics [19]). Let 〈(B|A)[d];C〉 be an r-conditional and let (B|A)[d] be a p-conditional, respectively. Then the following holds:

P | =_{△}^{pci} 〈 (B | A) [d], C 〉 i f f P | =_{fopcl} 〈 (B | A) [d], C 〉

(3)

P | =_{⊛}^{pci} 〈 (B | A [d], ⊤ 〉 i f f P | =_{⊙}^{no-ir} (B | A) [d]

(4)

4. PCI Logic and Maximum Entropy Semantics

If a knowledge base

ℛ

is consistent, there are usually many different models satisfying

ℛ

. The principle of maximum entropy chooses the unique distribution that has maximum entropy among all distributions satisfying a knowledge base

ℛ

[5,8]. Applying this principle to the PCI satisfaction relations

| =_{△}^{pci}

and

| =_{⊛}^{pci}

yields

P_{ℛ}^{{ME}_{★}} = \arg \max_{P \in P_{Ω} : P | =_{★}^{pci} ℛ} H (P)

(5)

with ★ being △ or ⊛, and where

H (P) = - \sum_{ω \in Ω} P (ω) \log P (ω)

is the entropy of a probability distribution P.

Example 3 (Misanthrope). The knowledge base

ℛ

_MI = {R₁, R₂}, adapted from [11], models friendship relations within a group of people with one exceptional member, a misanthrope. In general, if a person V likes another person U, then it is very likely that U likes V, too. However, there is one person, the misanthrope, who generally does not like other people:

\begin{array}{l} R_{1} : 〈 (l i k e s (U, V) | l i k e s (V, U))) [0.9], U \neq V 〉 \\ R_{2} : 〈 (l i k e s (a, V) | ⊤) [0.05], V \neq a 〉 \end{array}

Within the PCI framework, consider

ℛ

_MI together with constants

D = {a, b, c}

and the corresponding ME distributions ME_△(

ℛ

_MI) and ME_⊛(

ℛ

_MI) under PCI-grounding and PCI-aggregation semantics, respectively.

Under ME_△(

ℛ

_MI), all six ground conditionals emerging from R₁ have probability 0.9, for instance, ME_△(

ℛ

_MI)(likes(a, b) | likes(b, a)) = 0.9.

On the other hand, for the distribution ME_⊛(

ℛ

_MI), we have ME_⊛(

ℛ

_MI) (likes(a, b) | likes(b, a)) = 0.46016768 and ME_⊛(

ℛ

_MI)(likes(a, c) | likes(c, a)) = 0.46016768, while the other four ground conditionals resulting from R₁ have probability 0.96674480.

Example 3 shows that in general the ME model under PCI-grounding semantics of a knowledge base

ℛ

differs from its ME model under PCI-aggregation semantics. However, if

ℛ

is parametrically uniform [11,16], the situation changes. Parametric uniformity of a knowledge base

ℛ

is introduced in [11] and refers to the fact that the ME distribution under FO-PCL (or PCI-grounding) semantics satisfying a set of m ground conditionals can be represented by a set of just m optimization parameters. A relational knowledge base

ℛ

is parametrically uniform iff for every conditional r ∈

ℛ

, all ground instances of r have the same optimization parameter (see [11,16] for details). For instance, the knowledge base

ℛ

′_EK from Example 2 is parametrically uniform, while the knowledge base

ℛ

_MI from Example 3 is not parametrically uniform. Thus, if

ℛ

is parametrically uniform, just one optimization parameter for each conditional r ∈

ℛ

instead of one optimization parameter for each ground instance of r has to be computed; this can be exploited when computing the ME distribution [17]. In [20], a set of transformation rules is developed that transforms any consistent knowledge base

ℛ

into a knowledge base

ℛ

′ such that

ℛ

and

ℛ

′ have the same ME model under grounding semantics and

ℛ

′ is parametrically uniform.

Using the PCI framework providing both grounding and aggregating semantics for conditionals with instantiation restrictions, the ME models for PCI-grounding and PCI-aggregation semantics coincide if

ℛ

is parametrically uniform.

Proposition 2 ([19]). Let

ℛ

be a PCI knowledge base. If

ℛ

is parametrically uniform, then ME△(

ℛ

) = ME_⊛(

ℛ

).

Thus, while in general ME△(

ℛ

) ≠ ME_⊛(R), parametric uniformity of

ℛ

ensures that ME△(

ℛ

) = ME_⊛(

ℛ

).

5. Computation and Comparison of Maximum Entropy Distributions

In Example 3 we already presented some concrete probability values for ME distributions. We will now look into more details of the ME distributions obtained from both PCI-grounding and PCI-aggregation semantics. In particular, we will illustrate how the ME distribution for PCI-grounding and PCI-aggregation semantics evolve when transforming a knowledge base that is not parametrically uniform into a knowledge base that is parametrically uniform.

5.1. Achieving Parametric Uniformity

While transforming a knowledge base into one that is parametrically uniform [11] does not change its ME model under (FO-PCL or PCI) grounding semantics, it allows for a simpler ME model computation [17]. In [20], a set of transformation rules

PU

is presented allowing to transform any consistent knowledge base

ℛ

into a parametrically uniform knowledge base

PU (ℛ)

with the same maximum entropy model under grounding semantics. An implementation of

PU

[21] is available within the KReator environment (KReator can be found at http://kreator-ide.sourceforge.net/), an integrated development environment for relational probabilistic logic [22]. The CSPU (Conditional Structures and Parametric Uniformity) component [23] of KReator generates

PU

transformation protocols, and a part the protocol for the misanthrope knowledge base

ℛ

_MI from Example 3 is shown in Figure 1. For details of the

PU

transformation rules we refer to [20]; we just remark here that

PU

stepwise removes all interactions among the conditionals where an interaction in a knowledge

ℛ

base indicates that

ℛ

is not parametrically uniform [20]. In each

PU

transformation step, one conditional

ℛ

is replaced by two conditionals R₁, R₂ originating from R. Table 1 illustrates how

ℛ

_MI evolves from

ℛ

_MI =

ℛ

₁ to

ℛ

₂ and from

ℛ

₂ to

ℛ_{3} = PU (ℛ_{M I})

.

5.2. Maximum Entropy Distributions for Grounding and Aggregation Semantics

Using KREATOR we computed the ME distributions for the three knowledge bases

ℛ

₁,

ℛ

₂, and

ℛ

₃ involved in the

PU

transformation of

ℛ

_MI for both PCI-grounding and PCI-aggregation semantics. For all admissible ground instances of the conditionals occurring in

ℛ

₁,

ℛ

₂ and

ℛ

₃, we computed their probability under the ME distributions for PCI-grounding and PCI-aggregation semantics. The results are shown in Table 2, using the abbreviation l(x, y) for likes(x, y).

There are three pairwise different ME distributions (i.e., ME_⊛(

ℛ

₁), ME_⊛(

ℛ

₂), ME_⊛(

ℛ

₃)) under PCI-aggregation semantics for the three pairwise different knowledge bases

ℛ

₁,

ℛ

₂,

ℛ

₃. On the other hand ME_△(

ℛ

₁) = ME_△(

ℛ

₂) = ME_△(

ℛ

₃) = ME_⊛(

ℛ

₃) holds since the

PU

transformation process does not change the maximum entropy model under PCI-grounding semantics and because

ℛ

₃ is parametrically uniform.

It is interesting to note that for the ground instances originating from R₁ there are two distinct probabilities under ME_⊛(

ℛ

₁), three probabilities under ME_⊛(

ℛ

₂), and as implied by Proposition 2 one probability under ME_⊛(

ℛ

₃). In all cases, PCI-aggregation semantics ensures that the distinct probabilities aggregate to the probability stated in the corresponding conditionals.

For the comparison of PCI-grounding and PCI-aggregation, it is also interesting to compare their ME behavior with respect to queries that are not instances of a conditional given in the knowledge base. For example, for likes(b, c) we observe

\begin{array}{l} M E_{⊛} (ℛ_{1}) (l i k e s (b, c)) = 0.64220609 \\ M E_{⊛} (ℛ_{2}) (l i k e s (b, c)) = 0.64490162 \\ M E_{⊛} (ℛ_{3}) (l i k e s (b, c)) = 0.58699481 \end{array}

and for likes(b, a) we get

\begin{array}{l} M E_{⊛} (ℛ_{1}) (l i k e s (b, a)) = 0.10504266 \\ M E_{⊛} (ℛ_{2}) (l i k e s (b, a)) = 0.64490162 \\ M E_{⊛} (ℛ_{3}) (l i k e s (b, a)) = 0.05000000 \end{array}

for PCI-aggregation semantics, while

\begin{array}{l} M E_{△} (ℛ_{i}) (l i k e s (b, c)) = 0.58699481 \\ M E_{△} (ℛ_{i}) (l i k e s (b, a)) = 0.05000000 \end{array}

holds for i ∈ {1, 2, 3} under PCI-grounding semantics.

6. Conclusions and Further Work

In this paper, we considered maximum entropy based semantics for relational probabilistic conditionals. FO-PCL [16] employs a grounding semantics and uses instantiation restrictions for the free variables occurring in a conditional, requiring all admissible instances of a conditional to have the given probability. Aggregating semantics [15] defines probabilistic satisfaction by interpreting the intended probability of a conditional with free variables only as a guideline for the probabilities of its instances that aggregate to the conditional’s given probability, while the actual probabilities for grounded instances may differ.

While the original definition of aggregation semantics [15] considered only conditionals without constraints representing instantiation restrictions, we developed the framework PCI extending aggregation semantics so that instantiation restrictions can also be taken into account, but without giving up the flexibility of aggregating over distinct probabilities. In comparison with [15], under PCI-aggregation semantics one can restrict the set of groundings of a conditional over which aggregating with respect to a conditional takes place by providing a corresponding constraint formula for the conditional. From a knowledge representation point of view, this can be useful in various situations, for instance when we talk about a particular relationship among individuals while already knowing that a specific individual like Clyde is an exception with respect to the given relationship.

Note that PCI captures both grounding semantics and aggregating semantics without instantiation restrictions as special cases. For the case that a knowledge base is parametrically uniform, PCI-grounding and PCI-aggregation semantics coincide when employing the maximum entropy principle, while for a knowledge base that is not parametrically uniform the two ME semantics induce different models in general. We illustrated the differences and common features of both semantics on a concrete knowledge base, using the KREATOR environment for computing the ME models and answering queries with respect to these distributions. We expect that observations of this kind will support the discussion of both formal and common sense properties of probabilistic first-order inference in general and inference according to the principle of maximum entropy in a first-order setting in particular.

Acknowledgments

We would like to thank the anonymous referees of this paper for their helpful comments and valuable suggestions.

Author Contributions

This paper is joint work by the authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Pearl, J. Probabilistic Reasoning in Intelligent Systems; Morgan Kaufmann: San Mateo, CA, USA, 1988. [Google Scholar]
Cowell, R.; Dawid, A.; Lauritzen, S.; Spiegelhalter, D. Probabilistic Networks and Expert Systems; Springer: New York, NY, USA; Berlin/Heidelberg, Germany, 1999. [Google Scholar]
Jaynes, E. Papers on Probability, Statistics and Statistical Physics; D. Reidel Publishing Company: Dordrecht, The Netherlands, 1983. [Google Scholar]
Shore, J.; Johnson, R. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Inf. Theory 1980, IT-26, 26–37. [Google Scholar]
Paris, J. The Uncertain Reasoner’s Companion—A Mathematical Perspective; Cambridge University Press: Cambridge, UK, 1994. [Google Scholar]
Paris, J.; Vencovska, A. In defence of the maximum entropy inference process. Int. J. Approx. Reason. 1997, 17, 77–103. [Google Scholar]
Paris, J. Common Sense and Maximum Entropy. Synthese 1999, 117, 75–93. [Google Scholar]
Kern-Isberner, G. Conditionals in Nonmonotonic Reasoning and Belief Revision; Lecture Notes in Artificial Intelligence LNAI 2087; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
Paris, J.B.; What, You; See, Is. What You Get. Entropy 2014, 16, 6186–6194. [Google Scholar]
Delgrande, J. On first-order conditional logics. Artif. Intell. 1998, 105, 105–137. [Google Scholar]
Fisseler, J. Learning and Modeling with Probabilistic Conditional Logic; Dissertations in Artificial Intelligence; IOS Press: Amsterdam, The Netherlands, 2010; Volume 328. [Google Scholar]
Halpern, J. Reasoning about Uncertainty; MIT Press: Cambridge, MA, USA, 2005. [Google Scholar]
Getoor, L.; Taskar, B. Introduction to Statistical Relational Learning; Getoor, L., Taskar, B., Eds.; MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
Kern-Isberner, G.; Beierle, C.; Finthammer, M.; Thimm, M. Comparing and Evaluating Approaches to Probabilistic Reasoning: Theory, Implementation, and Applications. In Transactions on Large-Scale Data- and Knowledge-Centered Systems VI; Hameurlain, A., Küng, J., Wagner, R., Liddle, S.W., Schewe, K.-D., Zhou, X., Eds.; Volume 7600, Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2012; pp. 31–75. [Google Scholar]
Kern-Isberner, G.; Thimm, M. Novel Semantical Approaches to Relational Probabilistic Conditionals. Proceedings of the Twelfth International Conference on the Principles of Knowledge Representation and Reasoning, KR’2010, Toronto, ON, Canada, 9–13 May 2010; Lin, F., Sattler, U., Truszczynski, M., Eds.; AAAI Press: Menlo Park, CA, USA, 2010; pp. 382–391. [Google Scholar]
Fisseler, J. First-order probabilistic conditional logic and maximum entropy. Log. J. IGPL 2012, 20, 796–830. [Google Scholar]
Finthammer, M.; Beierle, C. How to Exploit Parametric Uniformity for Maximum Entropy Reasoning in a Relational Probabilistic Logic. In Logics in Artificial Intelligence, Proceedings of 13th European Conference, JELIA 2012, Toulouse, France, 26–28 September 2012; Fariñas del Cerro, L., Herzig, A., Mengin, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; 7519, pp. 189–201. [Google Scholar]
Rödder, W.; Reucher, E.; Kulmann, F. Features of the Expert-System-Shell SPIRIT. Log. J. IGPL 2006, 14, 483–500. [Google Scholar]
Finthammer, M.; Beierle, C. Instantiation Restrictions for Relational Probabilistic Conditionals. In Scalable Uncertainty Management, Proceedings the 6th International Conference on Scalable Uncertainty Management, Marburg, Germany, 17–19 September 2012; Hüllermeier, E., Link, S., Fober, T., Seeger, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; 7520, pp. 598–605. [Google Scholar]
Beierle, C.; Krämer, A. Achieving Parametric Uniformity for Knowledge Bases in a Relational Probabilistic Conditional Logic with Maximum Entropy Semantics. Ann. Math. Artif. Intell. 2015, 73, 5–45. [Google Scholar]
Beierle, C.; Höhnerbach, M.; Marto, M. Implementation of a Transformation System for Relational Probabilistic Knowledge Bases Simplifying the Maximum Entropy Model Computation. Proceedings of the Twenty-Seventh International Florida Artificial Intelligence Research Society Conference (FLAIRS 2014), Pensacola Beach, FL, USA, 21–23 May 2014; AAAI Press: Menlo Park, CA, USA, 2014; pp. 486–489. [Google Scholar]
Finthammer, M.; Thimm, M. An Integrated Development Environment for Probabilistic Relational Reasoning. Log. J. IGPL 2012, 20, 831–871. [Google Scholar]
Beierle, C.; Kuche, S.; Finthammer, M.; Kern-Isberner, G. A SoftwareSystem for the Computation, Visualization, and Comparison of Conditional Structures for Relational Probabilistic Knowledge Bases. Proceedings of the Twenty-Eighth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2015), Hollywood, FL, USA, 18–20 May 2015; AAAI Press: Menlo Park, CA, USA, 2015. In press. [Google Scholar]

Figure 1. The KReator protocol of the

PU

transformation steps from

ℛ

_MI =

ℛ

₁ to

ℛ

₂ and from

ℛ

₂ to

ℛ_{3} = PU (ℛ_{M I})

for

ℛ

_MI from Example 3.

Figure 1. The KReator protocol of the

PU

transformation steps from

ℛ

_MI =

ℛ

₁ to

ℛ

₂ and from

ℛ

₂ to

ℛ_{3} = PU (ℛ_{M I})

for

ℛ

_MI from Example 3.

Table 1. Conditionals occurring in

ℛ

₁,

ℛ

₂, and

ℛ

₃ given by the

PU

transformation steps from

ℛ

_MI =

ℛ

₁ to

ℛ

₂ and from

ℛ

₂ to

ℛ_{3} = PU (ℛ_{M I})

for

ℛ

_MI from Example 3 (cf. Figure 1) using the abbreviation l(x, y) for likes(x, y). Conditional R₁ in

ℛ

₁ is replaced by R_1·1 and R_1·2 in

ℛ

₂, and conditional R_1·2 in

ℛ

₂ is replaced by R_1·2·1 and R_1·2·2 in

ℛ

₃.

**Table 1.** Conditionals occurring in $ℛ$ ₁, $ℛ$ ₂, and $ℛ$ ₃ given by the $PU$ transformation steps from $ℛ$ _MI = $ℛ$ ₁ to $ℛ$ ₂ and from $ℛ$ ₂ to $ℛ_{3} = PU (ℛ_{M I})$ for $ℛ$ _MI from Example 3 (cf. Figure 1) using the abbreviation l(x, y) for likes(x, y). Conditional R₁ in $ℛ$ ₁ is replaced by R_1·1 and R_1·2 in $ℛ$ ₂, and conditional R_1·2 in $ℛ$ ₂ is replaced by R_1·2·1 and R_1·2·2 in $ℛ$ ₃.
$ℛ$ ₁	$ℛ$ ₂	$ℛ$ ₃
	R_1·1: 〈(l(U,a)\|l(a,U)))[0.9],U ≠ a〉
R₁ : 〈(l(U,V)\|l(V,U)))[0.9],U ≠ V〉	R_1·2 : 〈(l(U,V)\|l(V,U)))[0.9],U ≠ V, V ≠ a〉	R_1·2·1 : 〈(l(a, V)\|l(V, a)))[0.9], V ≠ a〉
R₁ : 〈(l(U,V)\|l(V,U)))[0.9],U ≠ V〉	R_1·2 : 〈(l(U,V)\|l(V,U)))[0.9],U ≠ V, V ≠ a〉	R_1·2·2·: 〈(l(U,V)\|l(V,U)))[0.9],U ≠ V, V ≠ a, U ≠ a 〉

R₂: 〈(l(a,V)\|⊤)[0.05], V ≠ a〉

Table 2. Maximum entropy probabilities of the ground instances of the conditionals in

ℛ

₁,

ℛ

₂, and

ℛ

₃ (cf. Table 1) under PCI-aggregation semantics; for PCI-grounding semantics, ME_△(

ℛ

₁)(g) = ME_△(

ℛ

₂)(g) = ME_△(

ℛ

₃)(g) = ME_⊛(

ℛ

₃)(g) holds since the

PU

transformation process does not change the maximum entropy model under grounding semantics and because

ℛ

₃ is parametrically uniform.

**Table 2.** Maximum entropy probabilities of the ground instances of the conditionals in $ℛ$ ₁, $ℛ$ ₂, and $ℛ$ ₃ (cf. Table 1) under PCI-aggregation semantics; for PCI-grounding semantics, ME_△( $ℛ$ ₁)(g) = ME_△( $ℛ$ ₂)(g) = ME_△( $ℛ$ ₃)(g) = ME_⊛( $ℛ$ ₃)(g) holds since the $PU$ transformation process does not change the maximum entropy model under grounding semantics and because $ℛ$ ₃ is parametrically uniform.
$ℛ$ ₁	$ℛ$ ₂	$ℛ$ ₃	ground instance g	ME_⊛( $ℛ$ ₁)(g)	ME_⊛( $ℛ$ ₂)(g)	ME_⊛( $ℛ$ ₃)(g)
	R_1·1		(l(b, a)\|l(a, b))	0.96674480	0.90000000	0.90000000
	R_1·1		(l(c, a)\|l(a, c))	0.96674480	0.90000000	0.90000000

R₁		R_1·2·1	(l(a, b)\|l(b, a))	0.46016768	0.45380549	0.89999999
R₁		R_1·2·1	(l(a, c)\|l(c, a))	0.46016768	0.45380549	0.89999999
	R_1·2
		R_1·2·2	(l(b, c)\|l(c, b))	0.96674480	0.96860780	0.90000000
		R_1·2·2	(l(c, b)\|l(b, c))	0.96674480	0.96860780	0.90000000

R₂			l(a, b)	0.0500000	0.05000000	0.05000000
R₂			l(a, c)	0.0500000	0.05000000	0.05000000

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Beierle, C.; Finthammer, M.; Kern-Isberner, G. Relational Probabilistic Conditionals and Their Instantiations under Maximum Entropy Semantics for First-Order Knowledge Bases. Entropy 2015, 17, 852-865. https://doi.org/10.3390/e17020852

AMA Style

Beierle C, Finthammer M, Kern-Isberner G. Relational Probabilistic Conditionals and Their Instantiations under Maximum Entropy Semantics for First-Order Knowledge Bases. Entropy. 2015; 17(2):852-865. https://doi.org/10.3390/e17020852

Chicago/Turabian Style

Beierle, Christoph, Marc Finthammer, and Gabriele Kern-Isberner. 2015. "Relational Probabilistic Conditionals and Their Instantiations under Maximum Entropy Semantics for First-Order Knowledge Bases" Entropy 17, no. 2: 852-865. https://doi.org/10.3390/e17020852

$ℛ$ ₁	$ℛ$ ₂	$ℛ$ ₃
	R_1·1: 〈(l(U,a)\|l(a,U)))[0.9],U ≠ a〉
R₁ : 〈(l(U,V)\|l(V,U)))[0.9],U ≠ V〉	R_1·2 : 〈(l(U,V)\|l(V,U)))[0.9],U ≠ V, V ≠ a〉	R_1·2·1 : 〈(l(a, V)\|l(V, a)))[0.9], V ≠ a〉
R₁ : 〈(l(U,V)\|l(V,U)))[0.9],U ≠ V〉	R_1·2 : 〈(l(U,V)\|l(V,U)))[0.9],U ≠ V, V ≠ a〉	R_1·2·2·: 〈(l(U,V)\|l(V,U)))[0.9],U ≠ V, V ≠ a, U ≠ a 〉

R₂: 〈(l(a,V)\|⊤)[0.05], V ≠ a〉

Article Menu

Relational Probabilistic Conditionals and Their Instantiations under Maximum Entropy Semantics for First-Order Knowledge Bases

Abstract

1. Introduction

2. Background: FO-PCL and Aggregation Semantics

3. PCI Logic

4. PCI Logic and Maximum Entropy Semantics

5. Computation and Comparison of Maximum Entropy Distributions

5.1. Achieving Parametric Uniformity

5.2. Maximum Entropy Distributions for Grounding and Aggregation Semantics

6. Conclusions and Further Work

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI