Bayes Reasoning

Bayesian
Reasoning
Thomas Bayes, 1701-1761
1
Adapted from slides by Tim Finin
Today’s topics
 Review probability theory
 Bayesian inference
 From the joint distribution
 Using independence/factoring
 From sources of evidence
 Bayesian Nets
2
Sources of Uncertainty
 Uncertain inputs -- missing and/or noisy data
 Uncertain knowledge
 Multiple causes lead to multiple effects
 Incomplete enumeration of conditions or effects
 Incomplete knowledge of causality in the domain
 Probabilistic/stochastic effects
 Uncertain outputs
 Abduction and induction are inherently uncertain
 Default reasoning, even deductive, is uncertain
 Incomplete deductive inference may be uncertain
Probabilistic reasoning only gives probabilistic results

(summarizes uncertainty from various sources)
3
Decision making with
uncertainty
Rational behavior:
 For each possible action, identify the possible outcomes
 Compute the probability of each outcome
 Compute the utility of each outcome
 Compute the probability-weighted (expected) utility
over possible outcomes for each action

 Select action with the highest expected utility (principle
of Maximum Expected Utility)
4
Why probabilities anyway?
Kolmogorov showed that three simple axioms lead to the
rules of probability theory
1.All probabilities are between 0 and 1:
0 ≤ P(a) ≤ 1
2.Valid propositions (tautologies) have probability 1, and
unsatisfiable propositions have probability 0:

P(true) = 1 ; P(false) = 0
3.The probability of a disjunction is given
by:
P(a  b) = P(a) + P(b) – P(a  b) a ab b
5
Probability theory 101
 Random variables  Alarm, Burglary, Earthquake
 Domain  Boolean (like these), discrete, continuous
 Atomic event: complete  Alarm=TBurglary=TEarthquake=F
specification of state alarm  burglary  ¬earthquake
 Prior probability:  P(Burglary) = 0.1
degree of belief without
P(Alarm) = 0.1
any other evidence
P(earthquake) = 0.000003
 Joint probability: matrix
of combined probabilities  P(Alarm, Burglary) =
of a set of variables
6
Probability theory
101
 Conditional probability:  P(burglary | alarm) = .47
prob. of effect given causes P(alarm | burglary) = .9
 Computing conditional  P(burglary | alarm) =
probs: P(burglary  alarm) / P(alarm)
 P(a | b) = P(a  b) / P(b) = .09/.19 = .47
 P(b): normalizing constant
 Product rule:  P(burglary  alarm) =
 P(a  b) = P(a | b) * P(b) P(burglary | alarm) * P(alarm)
= .47 * .19 = .09
 Marginalizing:  P(alarm) =
 P(B) = ΣaP(B, a) P(alarm  burglary) +
 P(B) = ΣaP(B | a) P(a) P(alarm  ¬burglary)
7
(conditioning) = .09+.1 = .19
Example: Inference from the joint
P(burglary | alarm) = α P(burglary, alarm)

= α [P(burglary, alarm, earthquake) + P(burglary, alarm, ¬earthquake)
= α [ (.01, .01) + (.08, .09) ]
= α [ (.09, .1) ]
Since P(burglary | alarm) + P(¬burglary | alarm) = 1, α = 1/(.09+.1) = 5.26
(i.e., P(alarm) = 1/α = .19)
P(burglary | alarm) = .09 * 5.26 = .474
P(¬burglary | alarm) = .1 * 5.26 = .526 8
Exercise:
Inference from the joint
 Queries:
 What is the prior probability of smart? 0.8
 What is the prior probability of study? 0.6
 What is the conditional probability of prepared,
given study and smart?
P(prepared,smart,study)/P(smart,study) = 0.9
9
Independence
 When sets of variables don’t affect each others’ probabilities,
we call them independent, and can easily compute their joint
and conditional probability:
Independent(A, B) → P(AB) = P(A) * P(B), P(A | B) = P(A)
 {moonPhase, lightLevel} might be independent of {burglary,
alarm, earthquake}
Maybe not: crooks may be more likely to burglarize houses
during a new moon (and hence little light)

But if we know the light level, the moon phase doesn’t affect
whether we are burglarized

If burglarized, light level doesn’t affect if alarm goes off
 Need a more complex notion of independence and methods

10for reasoning about the relationships
Exercise: Independence
Query: Is smart independent of study?

•P(smart|study) == P(smart)
•P(smart|study) = P(smart study)/P(study)
•P(smart|study) = (.432 + .048)/(.432 + .048 + .084 + .036) = .48/.6 =
0.8
•P(smart) = .432 + .16 + .048 + .16 = 0.8 INDEPENDENT!
11
Conditional independence
 Absolute independence:
 A and B are independent if P(A  B) = P(A) * P(B);
equivalently, P(A) = P(A | B) and P(B) = P(B | A)
 A and B are conditionally independent given C if
 P(A  B | C) = P(A | C) * P(B | C)
 This lets us decompose the joint distribution:
 P(A  B  C) = P(A | C) * P(B | C) * P(C)
 Moon-Phase and Burglary are conditionally
independent given Light-Level
 Conditional independence is weaker than absolute
independence, but still useful in decomposing the
12
full joint probability distribution
Exercise: Conditional
independence
Queries:
Is smart conditionally independent of prepared, given study?
–P(smart prepared | study) == P(smart | study) * P(prepared | study)
–P(smart  prepared | study) = P(smart  prepared  study) / P(study)
= .432/ (.432 + .048 + .084 + .036) = .432/.6 = .72
-P(smart | study) * P(prepared | study) = .8 * .86 = .688 NOT!
13
Bayes’ rule
 Derived from the product rule:
 P(C | E) = P(E | C) * P(C) / P(E)
 Often useful for diagnosis:
 If E are (observed) effects and C are (hidden) causes,
 We may have a model for how causes lead to effects
(P(E | C))
 We may also have prior beliefs (based on experience)
about the frequency of occurrence of effects (P(C))
 Which allows us to reason abductively from effects
14 to causes (P(C | E))
Ex: meningitis and stiff neck
 Meningitis (M) can cause a a stiff neck (S), though
there are many other causes for S, too
 We’d like to use S as a diagnostic symptom and
estimate p(M|S)
 Studies can easily estimate p(M), p(S) and p(S|M)
p(S|M)=0.7, p(S)=0.01, p(M)=0.00002
 Applying Bayes’ Rule:
p(M|S) = p(S|M) * p(M) / p(S) = 0.0014
15
Bayesian inference
 In the setting of diagnostic/evidential reasoning

Hi P( Hi ) hypotheses
P( E j | Hi )
E1 Ej Em evidence/manifestations
P( Hi )
 Know prior probability of hypothesis P( E j | Hi )
conditional probability P( H i | E j )
 Want to compute the posterior probability
 Bayes’s theorem (formula 1):
16 P(Hi | E j ) = P(Hi )* P(E j | Hi ) / P(E j )
Simple Bayesian
diagnostic reasoning
 Also known as: Naive Bayes classifier
 Knowledge base:
 Evidence / manifestations: E1, … Em
 Hypotheses / disorders: H1, … Hn
Note: Ej and Hi are binary; hypotheses are mutually exclusive (non-
overlapping) and exhaustive (cover all possible cases)
 Conditional probabilities: P(Ej | Hi), i = 1, … n; j = 1, … m
 Cases (evidence for a particular instance): E1, …, El
 Goal: Find the hypothesis Hi with the highest posterior
 Maxi P(Hi | E1, …, El)
17
Simple Bayesian
diagnostic reasoning
 Bayes’ rule says that
P(Hi | E1… Em) = P(E1…Em | Hi) P(Hi) / P(E1… Em)
 Assume each evidence Ei is conditionally indepen-
dent of the others, given a hypothesis Hi, then:
P(E1…Em | Hi) = mj=1 P(Ej | Hi)
 If we only care about relative probabilities for the
Hi, then we have:
P(Hi | E1…Em) = α P(Hi) mj=1 P(Ej | Hi)
18
Limitations
 Cannot easily handle multi-fault situations, nor
cases where intermediate (hidden) causes exist:
 Disease D causes syndrome S, which causes correlated
manifestations M1 and M2
 Consider a composite hypothesis H1H2, where H1
and H2 are independent. What’s the relative posterior?
P(H1  H2 | E1, …, El) = α P(E1, …, El | H1  H2) P(H1  H2)
= α P(E1, …, El | H1  H2) P(H1) P(H2)
= α lj=1 P(Ej | H1  H2) P(H1) P(H2)
 How do we compute P(Ej | H1H2) ?
19
Limitations
 Assume H1 and H2 are independent, given E1, …, El?
 P(H1  H2 | E1, …, El) = P(H1 | E1, …, El) P(H2 | E1, …, El)
 This is a very unreasonable assumption
 Earthquake and Burglar are independent, but not given Alarm:
 P(burglar | alarm, earthquake) << P(burglar | alarm)
 Another limitation is that simple application of Bayes’s rule doesn’t
allow us to handle causal chaining:
 A: this year’s weather; B: cotton production; C: next year’s cotton price
 A influences C indirectly: A→ B → C
 P(C | B, A) = P(C | B)
 Need a richer representation to model interacting hypotheses, conditional
independence, and causal chaining
 Next: conditional independence and Bayesian networks!
20
Summary
 Probability is a rigorous formalism for uncertain
knowledge
 Joint probability distribution specifies probability of every
atomic event
 Can answer queries by summing over atomic events
 But we must find a way to reduce the joint size for non-
trivial domains
 Bayes’ rule lets unknown probabilities be computed
from known conditional probabilities, usually in the
causal direction
 Independence and conditional independence provide
the tools
21
Reasoning with Bayesian
Belief Networks
Overview
 Bayesian Belief Networks (BBNs) can reason
with networks of propositions and associated
probabilities
 Useful for many AI problems
 Diagnosis
 Expert systems
 Planning
 Learning
BBN Definition
 AKA Bayesian Network, Bayes Net
 A graphical model (as a DAG) of probabilistic relationships
among a set of random variables
 Links represent direct influence of one variable on another
source
Recall Bayes Rule
P( H , E ) = P( H | E ) P( E ) = P( E | H ) P( H )
P( E | H ) P( H )
P( H | E ) =
P( E )
Note the symmetry: we can compute

the probability of a hypothesis given
its evidence and vice versa.
Simple Bayesian Network
S   no, light , heavy Smoking Cancer
P(S=no) 0.80 C   none, benign, malignant 

P(S=light) 0.15
P(S=heavy) 0.05
Smoking= no light heavy
P(C=none) 0.96 0.88 0.60
P(C=benign) 0.03 0.08 0.25
P(C=malig) 0.01 0.04 0.15
More Complex Bayesian Network
Age Gender
Exposure Smoking
to Toxics
Cancer
Serum Lung
Calcium Tumor
Nodes
represent Age Gender
variables
Exposure Smoking
to Toxics
Links represent
Cancer “causal” relations
• Does gender cause
smoking?
• Influence might be a Serum Lung
more appropriate term Calcium Tumor
predispositions
Age Gender
Exposure Smoking
to Toxics
Cancer
Serum Lung
Calcium Tumor
Age Gender
Exposure Smoking
to Toxics
condition
Cancer
Serum Lung
Calcium Tumor
Age Gender
Exposure Smoking
to Toxics
Cancer
observable symptoms
Serum Lung
Calcium Tumor
Independence
Age and Gender are
Age Gender independent.
P(A,G) = P(G) P(A)
P(A |G) = P(A)

P(G |A) = P(G)
P(A,G) = P(G|A) P(A) = P(G)P(A)

P(A,G) = P(A|G) P(G) = P(A)P(G)
Conditional Independence
Cancer is independent
Age Gender
of Age and Gender
given Smoking
Smoking
P(C | A,G,S) = P(C|S)

Cancer
Conditional Independence: Naïve Bayes
Serum Calcium and Lung

Cancer Tumor are dependent
Serum Calcium is
independent of Lung Tumor,
Serum Lung given Cancer
Calcium Tumor
P(L | SC,C) = P(L|C)
P(SC | L,C) = P(SC|C)
Naïve Bayes assumption: evidence (e.g., symptoms) is indepen-

dent given the disease. This makes it easy to combine evidence
Explaining Away
Exposure to Toxics and Smoking are
Exposure Smoking independent
to Toxics
Exposure to Toxics is dependent on
Cancer Smoking, given Cancer
P(E=heavy|C=malignant) > P(E=heavy|

C=malignant, S=heavy)
• Explaining away: reasoning pattern where confirmation of one

cause of an event reduces need to invoke alternatives
• Essence of Occam’s Razor
Conditional Independence
A variable (node) is conditionally independent
of its non-descendants given its parents
Age Gender Non-Descendants
Exposure Smoking Parents Cancer is independent

to Toxics
of Age and Gender
Cancer given Exposure to
Toxics and Smoking.
Serum Lung
Calcium Tumor Descendants
Another non-descendant
A variable is
Age Gender
conditionally
independent of its
Exposure Smoking non-descendants
to Toxics given its parents
Diet Cancer Cancer is independent

of Diet given Exposure
Serum Lung to Toxics and Smoking
Calcium Tumor
Three kinds of reasoning
BBNs support three main kinds of reasoning:
Predicting conditions given predispositions
Diagnosing conditions given symptoms (and
predisposing)
Explaining a condition in by one or more
predispositions
To which we can add a fourth:
Deciding on an action based on the probabilities
of the conditions
Predictive Inference
Age Gender
How likely are elderly males
Exposure Smoking
to get malignant cancer?
to Toxics
Cancer P(C=malignant | Age>60, Gender=male)
Serum Lung
Calcium Tumor
Predictive and diagnostic
combined
Age Gender How likely is an elderly

male patient with high
Exposure Serum Calcium to have
Smoking
to Toxics malignant cancer?
Cancer P(C=malignant | Age>60,

Gender= male, Serum Calcium = high)
Serum Lung
Calcium Tumor
Explaining away
Age Gender  If we see a lung tumor, the
probability of heavy
smoking and of exposure
Exposure Smoking
to Toxics to toxics both go up.
Cancer • If we then observe heavy

smoking, the probability of
Serum exposure to toxics goes
Lung
Calcium Tumor back down.
Decision making
 Decision - an irrevocable allocation of
domain resources
 Decision should be made so as to maximize
expected utility.
 View decision making in terms of
 Beliefs/Uncertainties
 Alternatives/Decisions
 Objectives/Utilities
A Decision Problem
Should I have my party inside
or outside?
dry Regret
in
wet Relieved
dry
Perfect!
out
wet Disaster
Value Function
A numerical score over all possible states of the
world allows BBN to be used to make decisions
Location? Weather? Value

in dry $50
in wet $60
out dry $100
out wet $0
Two software tools
 Netica: Windows app for working with Bayes-
ian belief networks and influence diagrams
 A commercial product but free for small networks
 Includes a graphical editor, compiler, inference
engine, etc.
 Samiam: Java system for modeling and
reasoning with Bayesian networks
 Includes a GUI and reasoning engine

Bayes Reasoning

Uploaded by

Copyright:

Available Formats

Bayes Reasoning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayes Reasoning

Uploaded by

Copyright:

Available Formats

Bayesian

Thomas Bayes, 1701-1761

 From sources of evidence

 Incomplete enumeration of conditions or effects

 Incomplete knowledge of causality in the domain

 Default reasoning, even deductive, is uncertain

 Incomplete deductive inference may be uncertain

Probabilistic reasoning only gives probabilistic results

 Compute the probability of each outcome

 Compute the utility of each outcome

 Compute the probability-weighted (expected) utility

over possible outcomes for each action

of Maximum Expected Utility)

unsatisfiable propositions have probability 0:

P(burglary | alarm) = α P(burglary, alarm)

during a new moon (and hence little light)

whether we are burglarized

 Need a more complex notion of independence and methods

Query: Is smart independent of study?

 In the setting of diagnostic/evidential reasoning

Note the symmetry: we can compute

P(S=no) 0.80 C   none, benign, malignant 

P(A,G) = P(G) P(A)

P(A |G) = P(A)

P(A,G) = P(G|A) P(A) = P(G)P(A)

P(C | A,G,S) = P(C|S)

Serum Calcium and Lung

Naïve Bayes assumption: evidence (e.g., symptoms) is indepen-

P(E=heavy|C=malignant) > P(E=heavy|

• Explaining away: reasoning pattern where confirmation of one

Age Gender Non-Descendants

Exposure Smoking Parents Cancer is independent

Diet Cancer Cancer is independent

Diagnosing conditions given symptoms (and

Cancer P(C=malignant | Age>60, Gender=male)

Age Gender How likely is an elderly

Cancer P(C=malignant | Age>60,

Cancer • If we then observe heavy

Location? Weather? Value

You might also like