Bayes Reasoning
Bayes Reasoning
Bayes Reasoning
Adapted from slides by Tim Finin
Today’s topics
Review probability theory
Bayesian inference
From the joint distribution
Using independence/factoring
Bayesian Nets
Sources of Uncertainty
Uncertain inputs -- missing and/or noisy data
Uncertain knowledge
Multiple causes lead to multiple effects
Probabilistic/stochastic effects
Uncertain outputs
Abduction and induction are inherently uncertain
Why probabilities anyway?
Kolmogorov showed that three simple axioms lead to the
rules of probability theory
1.All probabilities are between 0 and 1:
0 ≤ P(a) ≤ 1
2.Valid propositions (tautologies) have probability 1, and
P(a b) = P(a) + P(b) – P(a b) a ab b
Probability theory 101
Random variables Alarm, Burglary, Earthquake
Domain Boolean (like these), discrete, continuous
Atomic event: complete Alarm=TBurglary=TEarthquake=F
specification of state alarm burglary ¬earthquake
Prior probability: P(Burglary) = 0.1
degree of belief without
P(Alarm) = 0.1
any other evidence
P(earthquake) = 0.000003
Joint probability: matrix
of combined probabilities P(Alarm, Burglary) =
of a set of variables
Probability theory
Conditional probability: P(burglary | alarm) = .47
prob. of effect given causes P(alarm | burglary) = .9
Computing conditional P(burglary | alarm) =
probs: P(burglary alarm) / P(alarm)
P(a | b) = P(a b) / P(b) = .09/.19 = .47
P(b): normalizing constant
Product rule: P(burglary alarm) =
P(a b) = P(a | b) * P(b) P(burglary | alarm) * P(alarm)
= .47 * .19 = .09
Marginalizing: P(alarm) =
P(B) = ΣaP(B, a) P(alarm burglary) +
P(B) = ΣaP(B | a) P(a) P(alarm ¬burglary)
(conditioning) = .09+.1 = .19
Example: Inference from the joint
What is the prior probability of smart? 0.8
What is the prior probability of study? 0.6
What is the conditional probability of prepared,
given study and smart?
P(prepared,smart,study)/P(smart,study) = 0.9
When sets of variables don’t affect each others’ probabilities,
we call them independent, and can easily compute their joint
and conditional probability:
Independent(A, B) → P(AB) = P(A) * P(B), P(A | B) = P(A)
{moonPhase, lightLevel} might be independent of {burglary,
alarm, earthquake}
Maybe not: crooks may be more likely to burglarize houses
Is smart conditionally independent of prepared, given study?
–P(smart prepared | study) == P(smart | study) * P(prepared | study)
–P(smart prepared | study) = P(smart prepared study) / P(study)
= .432/ (.432 + .048 + .084 + .036) = .432/.6 = .72
-P(smart | study) * P(prepared | study) = .8 * .86 = .688 NOT!
Bayes’ rule
Derived from the product rule:
P(C | E) = P(E | C) * P(C) / P(E)
Often useful for diagnosis:
If E are (observed) effects and C are (hidden) causes,
We may have a model for how causes lead to effects
(P(E | C))
We may also have prior beliefs (based on experience)
about the frequency of occurrence of effects (P(C))
Which allows us to reason abductively from effects
14 to causes (P(C | E))
Ex: meningitis and stiff neck
Meningitis (M) can cause a a stiff neck (S), though
there are many other causes for S, too
We’d like to use S as a diagnostic symptom and
estimate p(M|S)
Studies can easily estimate p(M), p(S) and p(S|M)
p(S|M)=0.7, p(S)=0.01, p(M)=0.00002
Applying Bayes’ Rule:
p(M|S) = p(S|M) * p(M) / p(S) = 0.0014
Bayesian inference
E1 Ej Em evidence/manifestations
P( Hi )
Know prior probability of hypothesis P( E j | Hi )
conditional probability P( H i | E j )
Want to compute the posterior probability
Bayes’s theorem (formula 1):
16 P(Hi | E j ) = P(Hi )* P(E j | Hi ) / P(E j )
Simple Bayesian
diagnostic reasoning
Also known as: Naive Bayes classifier
Knowledge base:
Evidence / manifestations: E1, … Em
Hypotheses / disorders: H1, … Hn
Note: Ej and Hi are binary; hypotheses are mutually exclusive (non-
overlapping) and exhaustive (cover all possible cases)
Conditional probabilities: P(Ej | Hi), i = 1, … n; j = 1, … m
Cases (evidence for a particular instance): E1, …, El
Goal: Find the hypothesis Hi with the highest posterior
Maxi P(Hi | E1, …, El)
Simple Bayesian
diagnostic reasoning
Bayes’ rule says that
P(Hi | E1… Em) = P(E1…Em | Hi) P(Hi) / P(E1… Em)
Assume each evidence Ei is conditionally indepen-
dent of the others, given a hypothesis Hi, then:
P(E1…Em | Hi) = mj=1 P(Ej | Hi)
If we only care about relative probabilities for the
Hi, then we have:
P(Hi | E1…Em) = α P(Hi) mj=1 P(Ej | Hi)
Cannot easily handle multi-fault situations, nor
cases where intermediate (hidden) causes exist:
Disease D causes syndrome S, which causes correlated
manifestations M1 and M2
Consider a composite hypothesis H1H2, where H1
and H2 are independent. What’s the relative posterior?
P(H1 H2 | E1, …, El) = α P(E1, …, El | H1 H2) P(H1 H2)
= α P(E1, …, El | H1 H2) P(H1) P(H2)
= α lj=1 P(Ej | H1 H2) P(H1) P(H2)
How do we compute P(Ej | H1H2) ?
Assume H1 and H2 are independent, given E1, …, El?
P(H1 H2 | E1, …, El) = P(H1 | E1, …, El) P(H2 | E1, …, El)
This is a very unreasonable assumption
Earthquake and Burglar are independent, but not given Alarm:
P(burglar | alarm, earthquake) << P(burglar | alarm)
Another limitation is that simple application of Bayes’s rule doesn’t
allow us to handle causal chaining:
A: this year’s weather; B: cotton production; C: next year’s cotton price
A influences C indirectly: A→ B → C
P(C | B, A) = P(C | B)
Need a richer representation to model interacting hypotheses, conditional
independence, and causal chaining
Next: conditional independence and Bayesian networks!
Probability is a rigorous formalism for uncertain
Joint probability distribution specifies probability of every
atomic event
Can answer queries by summing over atomic events
But we must find a way to reduce the joint size for non-
trivial domains
Bayes’ rule lets unknown probabilities be computed
from known conditional probabilities, usually in the
causal direction
Independence and conditional independence provide
the tools
Reasoning with Bayesian
Belief Networks
Bayesian Belief Networks (BBNs) can reason
with networks of propositions and associated
Useful for many AI problems
Expert systems
BBN Definition
AKA Bayesian Network, Bayes Net
A graphical model (as a DAG) of probabilistic relationships
among a set of random variables
Links represent direct influence of one variable on another
Recall Bayes Rule
P( H , E ) = P( H | E ) P( E ) = P( E | H ) P( H )
P( E | H ) P( H )
P( H | E ) =
P( E )
Age Gender
Exposure Smoking
to Toxics
Serum Lung
Calcium Tumor
More Complex Bayesian Network
represent Age Gender
Exposure Smoking
to Toxics
Links represent
Cancer “causal” relations
• Does gender cause
• Influence might be a Serum Lung
more appropriate term Calcium Tumor
More Complex Bayesian Network
Age Gender
Exposure Smoking
to Toxics
Serum Lung
Calcium Tumor
More Complex Bayesian Network
Age Gender
Exposure Smoking
to Toxics
Serum Lung
Calcium Tumor
More Complex Bayesian Network
Age Gender
Exposure Smoking
to Toxics
observable symptoms
Serum Lung
Calcium Tumor
Age and Gender are
Age Gender independent.
Explaining a condition in by one or more
To which we can add a fourth:
Deciding on an action based on the probabilities
of the conditions
Predictive Inference
Age Gender
How likely are elderly males
Exposure Smoking
to get malignant cancer?
to Toxics
Serum Lung
Calcium Tumor
Predictive and diagnostic