Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A Generalized Iterative Scaling Algorithm for Maximum Entropy Model Computations Respecting Probabilistic Independencies

  • Conference paper
  • First Online:
Foundations of Information and Knowledge Systems (FoIKS 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10833))

Abstract

Maximum entropy distributions serve as favorable models for commonsense reasoning based on probabilistic conditional knowledge bases. Computing these distributions requires solving high-dimensional convex optimization problems, especially if the conditionals are composed of first-order formulas. In this paper, we propose a highly optimized variant of generalized iterative scaling for computing maximum entropy distributions. As a novel feature, our improved algorithm is able to take probabilistic independencies into account that are established by the principle of maximum entropy. This allows for exploiting the logical information given by the knowledge base, represented as weighted conditional impact systems, in a very condensed way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    In this paper, predicate and variable names will always begin with an uppercase letter and constant names with a lowercase letter.

  2. 2.

    Actually, the numbers of adjustment steps are smaller in both cases since we group (partial) possible worlds with the same conditional impact together (weighted conditional impacts) and filter out “impossible” worlds beforehand.

  3. 3.

    We say that \(\mathcal {R}_1\) and \(\mathcal {R}_2\) share a ground atom \(A\in \mathcal {G}_\varSigma \) if there are \(r_1\in \mathcal {R}_1\) and \(r_2\in \mathcal {R}_2\) with ground instances \(r'_1\in \mathsf {Grnd}(r_1)\) and \(r'_2\in \mathsf {Grnd}(r_2)\) that both contain the ground atom A.

  4. 4.

    Consider the bijection \(\beta :\varOmega _{\mathcal {G}_c}\rightarrow \varOmega _{\mathcal {G}_d}\) which simply replaces the constant c with the constant d whenever c occurs.

  5. 5.

    This representation of \(\mathcal {P}^\mathsf {ME}_{\mathcal {R}}\) exists except for very rare pathological cases which can be circumvented by prescient knowledge engineering.

  6. 6.

    More precisely, uniform marginals of the probability distribution are considered in order to avoid iterations over the whole probability distribution.

  7. 7.

    Correctness here means that \(\alpha _0,\alpha _1,\ldots ,\alpha _m\) can be calculated with any precision if the loop in Step 4 is executed sufficiently often.

  8. 8.

    Here, \(\varGamma \) is the set of all ordinary \(\mathsf {WCI}\)s with respect to the knowledge base \(\mathcal {R}\).

References

  1. Getoor, L., Taskar, B. (eds.): Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007)

    MATH  Google Scholar 

  2. Raedt, L.D., Frasconi, P., Kersting, K., Muggleton, S.H. (eds.): Probabilistic Inductive Logic Programming. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78652-8

    Book  MATH  Google Scholar 

  3. Van Den Broeck, G.: First-order model counting in a nutshell. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), pp. 4086–4089. AAAI Press (2016)

    Google Scholar 

  4. Paris, J.B.: The Uncertain Reasoner’s Companion - A Mathematical Perspective. Cambridge University Press, Cambridge (1994)

    MATH  Google Scholar 

  5. Kern-Isberner, G.: Conditionals in Nonmonotonic Reasoning and Belief Revision. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44600-1

    Book  MATH  Google Scholar 

  6. Finthammer, M., Beierle, C.: A two-level approach to maximum entropy model computation for relational probabilistic logic based on weighted conditional impacts. In: Straccia, U., Calì, A. (eds.) SUM 2014. LNCS (LNAI), vol. 8720, pp. 162–175. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11508-5_14

    Chapter  Google Scholar 

  7. Thimm, M., Kern-Isberner, G.: On probabilistic inference in relational conditional logics. Logic J. IGPL 20(5), 872–908 (2012)

    Article  MathSciNet  Google Scholar 

  8. Halpern, J.Y.: An analysis of first-order logics of probability. Artif. Intell. 46(3), 311–350 (1990)

    Article  MathSciNet  Google Scholar 

  9. Paris, J.B.: Common sense and maximum entropy. Synthese 117(1), 75–93 (1999)

    Article  MathSciNet  Google Scholar 

  10. Darroch, J.N., Ratcliff, D.: Generalized iterative scaling for log-linear models. Ann. Math. Stat. 43(5), 1470–1480 (1972)

    Article  MathSciNet  Google Scholar 

  11. Koller, D., Friedman, N.: Probabilistic Graphical Models. MIT Press, Cambridge (2009)

    MATH  Google Scholar 

  12. Kern-Isberner, G., Thimm, M.: A ranking semantics for first-order conditionals. In: Proceedings of the 20th European Conference on Artificial Intelligence (ECAI). FAIA, vol. 242, pp. 456–461. IOS Press (2012)

    Google Scholar 

  13. Finthammer, M., Beierle, C.: Using equivalences of worlds for aggregation semantics of relational conditionals. In: Glimm, B., Krüger, A. (eds.) KI 2012. LNCS (LNAI), vol. 7526, pp. 49–60. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33347-7_5

    Chapter  Google Scholar 

  14. Wilhelm, M., Kern-Isberner, G., Ecke, A.: Basic independence results for maximum entropy reasoning based on relational conditionals. In: Proceedings of the 3rd Global Conference on Artificial Intelligence (GCAI). EPiC Series in Computing, vol. 50, pp. 36–50 (2017)

    Google Scholar 

  15. Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6(6), 721–741 (1984)

    Article  Google Scholar 

  16. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

Download references

Acknowledgements

This research was supported by the German National Science Foundation (DFG), Research Unit FOR 1513 on Hybrid Reasoning for Intelligent Systems.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Wilhelm .

Editor information

Editors and Affiliations

Proofs of Results

Proofs of Results

Proposition 1

Let \(\mathcal {R}\) be a consistent knowledge base, and let \(\{\mathcal {G}_1,\ldots ,\mathcal {G}_k\}\) be a syntax partition for \(\mathcal {R}\). For all \(\omega \in \varOmega \),

$$ \mathcal {P}^\mathsf {ME}_\mathcal {R}(\omega )=\prod _{j=1}^k \mathcal {P}^\mathsf {ME}_\mathcal {R}(\omega _{\mathcal {G}_j}). $$

Proof

We give a proof for those cases in which the representation (5) of \(\mathcal {P}^\mathsf {ME}_\mathcal {R}\) exists. The normalizing constant can be written as \(\alpha _0=\sum _{\omega \in \varOmega }\prod _{i=1}^m \alpha _i^{f_i(\omega )}\) where \(f_{X}(C)\) abbreviates \((1-p_i)\cdot \mathsf {ver}_{X}(C)-p_i\cdot \mathsf {fal}_X(C)\) for any ground formula \(C\in \mathsf {FOL}\). Further, let \(\mathfrak {R}=\{R^1_G,\ldots ,R^k_G\}\) be a \(\{\mathcal {G}_1,\ldots ,\mathcal {G}_k\}\)-respecting decomposition of \(\mathcal {R}\) with \(R^j_G=\{R^j_1,\ldots ,R^j_n\}\) for \(j=1,\ldots ,k\). Then, \(\alpha _0=\prod _{j=1}^k \alpha _0^j\) holds where \(\alpha _0^j=\sum _{\omega _j\in \varOmega _{\mathcal {G}_j}}\prod _{i=1}^m \alpha _i^{f_i(\omega _j)}\). For \(\omega \in \varOmega \setminus \varOmega ^0\), it follows that

$$\begin{aligned} \mathcal {P}^\mathsf {ME}_\mathcal {R}(\omega ) =&\alpha _0 \prod _{i=1}^m \alpha _i^{f_i(\omega )} = \alpha _0 \prod _{i=1}^m \prod _{j=1}^k \alpha _i^{f_{R^j_i}(\omega _{\mathcal {G}_j})} \\ =&\prod _{j=1}^k \Big [\left( \alpha _0^j \prod _{i=1}^m \alpha _i^{f_{R^j_i}(\omega _{\mathcal {G}_j})}\right) \cdot \prod _{l\ne j} \underbrace{\left( \sum _{\omega '_l\in \varOmega _{\mathcal {G}_l}} \alpha _0^l \prod _{i=1}^m \alpha _i^{f_{R^l_i}(\omega '_l)} \right) }_{=1}\Big ] \\ =&\prod _{j=1}^k \Big ( \sum _{\begin{array}{c} \omega '\in \varOmega \\ \omega '\,{\models }\,\omega _{\mathcal {G}_j} \end{array}} \alpha _0 \prod _{i=1}^m \prod _{l=1}^k \alpha _i^{f_{R^l_i}(\omega _{\mathcal {G}_l})}\Big ) = \prod _{j=1}^k \Big ( \sum _{\begin{array}{c} \omega '\in \varOmega \\ \omega '\,{\models }\,\omega _{\mathcal {G}_j} \end{array}} \alpha _0 \prod _{i=1}^m \alpha _i^ {f_i(\omega ')} \Big ) \\ =&\prod _{j=1}^k \mathcal {P}^\mathsf {ME}_\mathcal {R}(\omega _{\mathcal {G}_j}). \end{aligned}$$

If \(\omega \in \varOmega ^0\), there is a deterministic conditional \(r=(B|A)[p]\in \mathcal {R}\) and an index \(l\in \{1,\ldots ,k\}\) such that \(\mathsf {ver}_{\mathsf {Grnd}(r)}(\omega _{\mathcal {G}_l})>0\) if \(p=0\) and \(\mathsf {fal}_{\mathsf {Grnd}(r)}(\omega _{\mathcal {G}_l})>0\) if \(p=1\). As a consequence, every \(\omega '\) with \(\omega '\,{\models }\,\omega _{\mathcal {G}_l}\) is a null-world, and

$$ \prod _{j=1}^k \mathcal {P}^\mathsf {ME}_\mathcal {R}(\omega _{\mathcal {G}_j})= \left( \sum _{\omega '\,{\models }\,\omega _{\mathcal {G}_l}} \mathcal {P}^\mathsf {ME}_\mathcal {R}(\omega ')\right) \cdot \prod _{j\ne l} \mathcal {P}^\mathsf {ME}_{\mathcal {R}}(\omega _{\mathcal {G}_j})=0\cdot \prod _{j\ne l} \mathcal {P}^\mathsf {ME}_{\mathcal {R}}(\omega _{\mathcal {G}_j})=0 $$

as required. \(\square \)

Proposition 2

Let \(\mathcal {R}\) be a knowledge base, let \(\mathfrak {G}\) be a syntax partition for \(\mathcal {R}\), and let \(\mathfrak {R}\) be a \(\mathfrak {G}\)-respecting decomposition of \(\mathcal {R}\) as described above. If \(\omega \in \varOmega \) is not a null-world, then

$$\begin{aligned} {\varvec{\gamma }}_{\mathcal {R}_G}(\omega )=\big (( \sum _{j=1}^k (\gamma _{R^j_i}(\omega _{\mathcal {G}_j})_i)_1, \sum _{j=1}^k (\gamma _{R^j_i}(\omega _{\mathcal {G}_j})_i)_2 )\big )_{i=1,\ldots ,m}. \end{aligned}$$

If \(\omega \) is a null-world, then \({\varvec{\gamma }}_{\mathcal {R}^j_G}(\omega _{\mathcal {G}_j})\) is undefined for at least one \(j\in \{1,\ldots ,k\}\).

Proof

Let \(\omega \in \varOmega \setminus \varOmega ^0\). By definition, \(\varvec{\gamma }_{\mathcal {R}_\mathcal {G}}(\omega )=((\mathsf {ver}_i(\omega ),\mathsf {fal}_i(\omega )))_{i=1,\ldots ,m}\). Since \(\mathfrak {R}\) is a \(\mathfrak {G}\)-respecting decomposition of \(\mathcal {R}\), \(\mathsf {ver}_{i}(\omega )=\sum _{j=1}^k \mathsf {ver}_{R^j_i}(\omega _{\mathcal {G}_j})\) as well as \(\mathsf {fal}_{i}(\omega )=\sum _{j=1}^k \mathsf {fal}_{R^j_i}(\omega _{\mathcal {G}_j})\) hold for \(i=1,\ldots ,n\), and hence, in particular, this holds for \(i=1,\ldots ,m\) (since \(m\le n\)). By applying the definition of \(\varvec{\gamma }_{R^j_i}(\omega _{\mathcal {G}_j})\), the proposition follows. As syntax partitions also take deterministic conditionals into account, the statement concerning null-worlds follows immediately. \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wilhelm, M., Kern-Isberner, G., Finthammer, M., Beierle, C. (2018). A Generalized Iterative Scaling Algorithm for Maximum Entropy Model Computations Respecting Probabilistic Independencies. In: Ferrarotti, F., Woltran, S. (eds) Foundations of Information and Knowledge Systems. FoIKS 2018. Lecture Notes in Computer Science(), vol 10833. Springer, Cham. https://doi.org/10.1007/978-3-319-90050-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-90050-6_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-90049-0

  • Online ISBN: 978-3-319-90050-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics