Abstract
Maximum entropy distributions serve as favorable models for commonsense reasoning based on probabilistic conditional knowledge bases. Computing these distributions requires solving high-dimensional convex optimization problems, especially if the conditionals are composed of first-order formulas. In this paper, we propose a highly optimized variant of generalized iterative scaling for computing maximum entropy distributions. As a novel feature, our improved algorithm is able to take probabilistic independencies into account that are established by the principle of maximum entropy. This allows for exploiting the logical information given by the knowledge base, represented as weighted conditional impact systems, in a very condensed way.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In this paper, predicate and variable names will always begin with an uppercase letter and constant names with a lowercase letter.
- 2.
Actually, the numbers of adjustment steps are smaller in both cases since we group (partial) possible worlds with the same conditional impact together (weighted conditional impacts) and filter out “impossible” worlds beforehand.
- 3.
We say that \(\mathcal {R}_1\) and \(\mathcal {R}_2\) share a ground atom \(A\in \mathcal {G}_\varSigma \) if there are \(r_1\in \mathcal {R}_1\) and \(r_2\in \mathcal {R}_2\) with ground instances \(r'_1\in \mathsf {Grnd}(r_1)\) and \(r'_2\in \mathsf {Grnd}(r_2)\) that both contain the ground atom A.
- 4.
Consider the bijection \(\beta :\varOmega _{\mathcal {G}_c}\rightarrow \varOmega _{\mathcal {G}_d}\) which simply replaces the constant c with the constant d whenever c occurs.
- 5.
This representation of \(\mathcal {P}^\mathsf {ME}_{\mathcal {R}}\) exists except for very rare pathological cases which can be circumvented by prescient knowledge engineering.
- 6.
More precisely, uniform marginals of the probability distribution are considered in order to avoid iterations over the whole probability distribution.
- 7.
Correctness here means that \(\alpha _0,\alpha _1,\ldots ,\alpha _m\) can be calculated with any precision if the loop in Step 4 is executed sufficiently often.
- 8.
Here, \(\varGamma \) is the set of all ordinary \(\mathsf {WCI}\)s with respect to the knowledge base \(\mathcal {R}\).
References
Getoor, L., Taskar, B. (eds.): Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007)
Raedt, L.D., Frasconi, P., Kersting, K., Muggleton, S.H. (eds.): Probabilistic Inductive Logic Programming. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78652-8
Van Den Broeck, G.: First-order model counting in a nutshell. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), pp. 4086–4089. AAAI Press (2016)
Paris, J.B.: The Uncertain Reasoner’s Companion - A Mathematical Perspective. Cambridge University Press, Cambridge (1994)
Kern-Isberner, G.: Conditionals in Nonmonotonic Reasoning and Belief Revision. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44600-1
Finthammer, M., Beierle, C.: A two-level approach to maximum entropy model computation for relational probabilistic logic based on weighted conditional impacts. In: Straccia, U., Calì, A. (eds.) SUM 2014. LNCS (LNAI), vol. 8720, pp. 162–175. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11508-5_14
Thimm, M., Kern-Isberner, G.: On probabilistic inference in relational conditional logics. Logic J. IGPL 20(5), 872–908 (2012)
Halpern, J.Y.: An analysis of first-order logics of probability. Artif. Intell. 46(3), 311–350 (1990)
Paris, J.B.: Common sense and maximum entropy. Synthese 117(1), 75–93 (1999)
Darroch, J.N., Ratcliff, D.: Generalized iterative scaling for log-linear models. Ann. Math. Stat. 43(5), 1470–1480 (1972)
Koller, D., Friedman, N.: Probabilistic Graphical Models. MIT Press, Cambridge (2009)
Kern-Isberner, G., Thimm, M.: A ranking semantics for first-order conditionals. In: Proceedings of the 20th European Conference on Artificial Intelligence (ECAI). FAIA, vol. 242, pp. 456–461. IOS Press (2012)
Finthammer, M., Beierle, C.: Using equivalences of worlds for aggregation semantics of relational conditionals. In: Glimm, B., Krüger, A. (eds.) KI 2012. LNCS (LNAI), vol. 7526, pp. 49–60. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33347-7_5
Wilhelm, M., Kern-Isberner, G., Ecke, A.: Basic independence results for maximum entropy reasoning based on relational conditionals. In: Proceedings of the 3rd Global Conference on Artificial Intelligence (GCAI). EPiC Series in Computing, vol. 50, pp. 36–50 (2017)
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6(6), 721–741 (1984)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Acknowledgements
This research was supported by the German National Science Foundation (DFG), Research Unit FOR 1513 on Hybrid Reasoning for Intelligent Systems.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Proofs of Results
Proofs of Results
Proposition 1
Let \(\mathcal {R}\) be a consistent knowledge base, and let \(\{\mathcal {G}_1,\ldots ,\mathcal {G}_k\}\) be a syntax partition for \(\mathcal {R}\). For all \(\omega \in \varOmega \),
Proof
We give a proof for those cases in which the representation (5) of \(\mathcal {P}^\mathsf {ME}_\mathcal {R}\) exists. The normalizing constant can be written as \(\alpha _0=\sum _{\omega \in \varOmega }\prod _{i=1}^m \alpha _i^{f_i(\omega )}\) where \(f_{X}(C)\) abbreviates \((1-p_i)\cdot \mathsf {ver}_{X}(C)-p_i\cdot \mathsf {fal}_X(C)\) for any ground formula \(C\in \mathsf {FOL}\). Further, let \(\mathfrak {R}=\{R^1_G,\ldots ,R^k_G\}\) be a \(\{\mathcal {G}_1,\ldots ,\mathcal {G}_k\}\)-respecting decomposition of \(\mathcal {R}\) with \(R^j_G=\{R^j_1,\ldots ,R^j_n\}\) for \(j=1,\ldots ,k\). Then, \(\alpha _0=\prod _{j=1}^k \alpha _0^j\) holds where \(\alpha _0^j=\sum _{\omega _j\in \varOmega _{\mathcal {G}_j}}\prod _{i=1}^m \alpha _i^{f_i(\omega _j)}\). For \(\omega \in \varOmega \setminus \varOmega ^0\), it follows that
If \(\omega \in \varOmega ^0\), there is a deterministic conditional \(r=(B|A)[p]\in \mathcal {R}\) and an index \(l\in \{1,\ldots ,k\}\) such that \(\mathsf {ver}_{\mathsf {Grnd}(r)}(\omega _{\mathcal {G}_l})>0\) if \(p=0\) and \(\mathsf {fal}_{\mathsf {Grnd}(r)}(\omega _{\mathcal {G}_l})>0\) if \(p=1\). As a consequence, every \(\omega '\) with \(\omega '\,{\models }\,\omega _{\mathcal {G}_l}\) is a null-world, and
as required. \(\square \)
Proposition 2
Let \(\mathcal {R}\) be a knowledge base, let \(\mathfrak {G}\) be a syntax partition for \(\mathcal {R}\), and let \(\mathfrak {R}\) be a \(\mathfrak {G}\)-respecting decomposition of \(\mathcal {R}\) as described above. If \(\omega \in \varOmega \) is not a null-world, then
If \(\omega \) is a null-world, then \({\varvec{\gamma }}_{\mathcal {R}^j_G}(\omega _{\mathcal {G}_j})\) is undefined for at least one \(j\in \{1,\ldots ,k\}\).
Proof
Let \(\omega \in \varOmega \setminus \varOmega ^0\). By definition, \(\varvec{\gamma }_{\mathcal {R}_\mathcal {G}}(\omega )=((\mathsf {ver}_i(\omega ),\mathsf {fal}_i(\omega )))_{i=1,\ldots ,m}\). Since \(\mathfrak {R}\) is a \(\mathfrak {G}\)-respecting decomposition of \(\mathcal {R}\), \(\mathsf {ver}_{i}(\omega )=\sum _{j=1}^k \mathsf {ver}_{R^j_i}(\omega _{\mathcal {G}_j})\) as well as \(\mathsf {fal}_{i}(\omega )=\sum _{j=1}^k \mathsf {fal}_{R^j_i}(\omega _{\mathcal {G}_j})\) hold for \(i=1,\ldots ,n\), and hence, in particular, this holds for \(i=1,\ldots ,m\) (since \(m\le n\)). By applying the definition of \(\varvec{\gamma }_{R^j_i}(\omega _{\mathcal {G}_j})\), the proposition follows. As syntax partitions also take deterministic conditionals into account, the statement concerning null-worlds follows immediately. \(\square \)
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Wilhelm, M., Kern-Isberner, G., Finthammer, M., Beierle, C. (2018). A Generalized Iterative Scaling Algorithm for Maximum Entropy Model Computations Respecting Probabilistic Independencies. In: Ferrarotti, F., Woltran, S. (eds) Foundations of Information and Knowledge Systems. FoIKS 2018. Lecture Notes in Computer Science(), vol 10833. Springer, Cham. https://doi.org/10.1007/978-3-319-90050-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-90050-6_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90049-0
Online ISBN: 978-3-319-90050-6
eBook Packages: Computer ScienceComputer Science (R0)