15-381 Artificial Intelligence: Representation and Problem Solving Homework 2 - Solutions
15-381 Artificial Intelligence: Representation and Problem Solving Homework 2 - Solutions
15-381 Artificial Intelligence: Representation and Problem Solving Homework 2 - Solutions
Homework 2 - Solutions
Solution
The key idea is to recognize that the events HT and T H happen with equal probability p(1 − p). If we
associate a HT with a H and a T H with a T , then the algorithm is as follows
1. Toss coin twice
2. If HT return “Heads”, else if T H, return “Tails”, else go to 1.
1
one parameter. Now, consider N+1 binary random variables X1 . . . XN , Y that factorize according to Fig. 1
1. Suppose you wish to store the joint probability distribution of these N+1 variables as a single table.
How many parameters will you need to represent this table?
2. Now, suppose you were to utilize the fact the joint distribution factorizes accoording to the Bayes
Network. How many parameters will you need to completely describe the distribution if you use the
Bayesian Network representation? In other words, how many parameters will you need to fully specify
the values of all the conditional probability tables in this Bayesian Network.
Solution
1. There are N+1 boolean variables. If we have a parameter for every possible instantiation of the
variables, there will be 2N +1 parameters. But these parameters need to sum to one, so we can drop
one parameter of the 2N +1 . The answer is 2N +1 − 1
2. The CPT of Y will need one parameter(its a boolean variable without any parents). For each X, we’ll
need 2 parameters(P (X|Y ), P (X|¬Y ) for example). Therefore, we have 2N + 1 parameters in total.
Solution
n(n − 1)/2. Proof by construction: Consider a BN over X1 , X2 , . . . Xn such that there is an edge between
Xi , Xj ∀j > i. The total number of edges in this graph is n − 1 + (n − 2) + . . . + 0 = n(n − 1)/2. To show
that you cannot have a directed cycle in this graph, assume the contrary and suppose there is a cycle of the
form Xi1 , Xi2 , . . . Xim , Xi1 ; by construction of graph, we have i1 < i2 . . . im < i1 leading to i1 < i1 which is
a contradiction. Therefore no cycle exists.
Additionally,you cannot construct a BN with more than n(n − 1)/2 edges, since any directed graph with
more than n(n − 1)/2 edges should have atleast one pair of vertices for which there is more than one edge
implying that you have atleast one edge in both directions resulting in a cycle.
The Bayesian networks in Fig. 2 are all part of the alarm network introduced in class and in Russell and
Norvig. We use the notation X⊥Y to denote the variable X being independent of Y , and X⊥Y |Z to denote
X being independent of Y given Z.
1. For each of these three networks, write the factored joint distribution implied, in the form of p(X, Y ) =
p(X)p(Y |X)
2. Using the joint distribution you wrote down for Fig. 2(i), write down a formula for P (B, E).
2
3. Now prove that B⊥E.
4. Similarly, prove that B⊥M |A in the Bayesian network of Fig. 2(ii), and M ⊥J|A in the Bayesian
network of Fig. 2(iii).
Solution
1. (i) P (B, A, E) = P (B)P (E|B)P (A|B, E) (using chain rule) = P (B)P (E)P (A|B, E)(using fact that
B⊥E from BN structure). Similarly (ii) P (B, A, M ) = P (B)P (A|B)P (M |A) and (iii) P (M, A, J) =
P (A)P (M |A)P (J|A)
P P P
2. P(B,E) = a P (B, a, E) = a P (B)P (E)P (a|B, E) = P (B)P (E) a P (a|B, E) = P (B)P (E). Since
P (a|B, E) is a conditional probability distribution, the sum over all values of a is 1.
3. From solution of (ii), P(B,E) = P(B) P(E), therefore B⊥E by definition of independence.
4. P (B)P (A|B) = P (A)P (B|A) from Bayes rule. Therefore P (B, A, M ) = P (B)P (A|B)P (M |A) =
P (A)P (B|A)P (M |A). P (B, M |A) = P (B, A, M )/P (A) = P (A)P (B|A)P (M |A)/P (A) = P (B|A)P (M |A)
Therefore B⊥M |A.
P (M, A, J) = P (A)P (M |A)P (J|A). Therefore P (M, J|A) = P (M, J, A)/P (A) = P (A)P (M |A)P (J|A)/P (A) =
P (M |A)P (J|A). Therefore M ⊥J|A
(i) (ii)
Consider the Bayesian Network given in Fig. 3(i). Assume that each of the variables are boolean valued.
For each of the following, state the total number of operations(multiplication and addition) the variable
elimination algorithm will take to compute the answer. For example, if A and B are binary variables,
P
a P (B)P (a|B) will take two multiplications(P (B)∗P (A|B) and P (B)∗P (¬A|B)) and one addition(adding
those two terms). Assume that the algorithm avoids unnecessary computation, so any summations that are
irrelevant to the query will be avoided(see the textbook for an example of an irrelevant summation).
1. P (X3 |X4 ) i.e., the probability that X3 is true given that X4 is true. Assume that the variables are
eliminated in the order: X5 , X1 , X2 .
3
2. Solution:
P (X3 |X4 ) = P (X3 , X4 )/P (X4 ) = P (X3 , X4 )/(P (X3 , X4 ) + P (X3 , ¬X4 ))
XXX
P (X3 , X4 ) = P (x1 , x2 , X3 , X4 , x5 )
x2 x1 x5
P (x1 , x2 , X3 , X4 , x5 ) = P (x1 )P (X3 |x1 )P (x2 |x1 , X3 )P (X4 |x2 )P (x5 |x2 )
X X X
P (X3 , X4 ) = P (X4 |x2 ) P (x1 )P (X3 |x1 )P (x2 |x1 , X3 ) P (x5 |x2 )
x2 x1 x5
Computing f1 (x2 ) for one choice of x2 will take 2*2 multiplications and 1 addition = 5 opera-
tions. x2 can take two values, so in total, this will take 10 operations. Computing P (X3 , X4 ) =
P
x2 P (X4 |x2 )f1 (x2 ) which will take another 1*2 multiplications and 1 addition = 3 operations. There-
fore, to compute P (X3 , X4 ) will take 13 operations. Similarly P (X3 , ¬X4 ) will take another 13 opera-
tions, and finally computing P (X3 , X4 )/(P (X3 , X4 ) + P (X3 , ¬X4 )) will take another 1 addition (and
1 division). In total, this will take 27 operations(28 with div)
3. P (X3 |X4 ) but this time, with the elimination ordering X2 , X1 , X5
4. Solution:
P (X3 |X4 ) = P (X3 , X4 )/P (X4 ) = P (X3 , X4 )/(P (X3 , X4 ) + P (X3 , ¬X4 ))
XXX
P (X3 , X4 ) = P (x1 , x2 , X3 , X4 , x5 )
x5 x1 x2
XX X
P (X3 , X4 ) = P (x1 )P (X3 |x1 ) P (x5 |x2 )P (x2 |x1 , X3 )P (X4 |x2 )
x5 x1 x2
X
g1 (x5 , x1 ) = P (x5 |x2 )P (x2 |x1 , X3 )P (X4 |x2 )
x2
Computing g1 (x5 , x1 ) takes 2*2 multiplications and 1 addition = 5 operations for each choice of x5 , x1 .
Since there are four such values, this will take 20 operations.
XX
P (X3 , X4 ) = P (x1 )P (X3 |x1 )g1 (x5 , x1 )
x5 x1
P
g2 (x5 ) = x1 P (x1 )P (X3 |x1 )g1 (x5 , x1 ) will take 2*2 +1 = 5 operations P
for each choice of x5 . Since
there are two such values of x5 , this will take 10 operations. Computing x5 g2 (x5 ) will take another
1 addition operation, leading to a total of 31 operations to compute P (X3 , X4 ) Computing P (X3 |X4 )
will therefore take 31*2+1(addition)+1(division) = 63 operations(64 including div)
5. Now suppose that you had to answer the last two questions, this time with the Bayes Net given in
Fig. 3(ii). Would your answers change?(You don’t have to state the number of operations; just if they
will be different from previously with a brief explanation)
6. Solution:
In general, x6 would not be irrelevant and therefore, the answer would change.
4
Enumeration Ask(X,e,bn) begin
Data: X,Sthe Squery variable; e, the observed values for variables E; bn, a Bayes net with variables
X E Y
Result: distribution over X
Q(X) ← a distribution over X, initially empty;
foreach value xi of X do
extend e with value xi for X;
Q(Xi ) ← Enumerate All(Vars[bn], e);
end
return Normalize(Q(X))
end
Enumerate All(vars, e) begin
if Empty?(vars) then
return 1.0
end
Y ← First(vars);
if Y has value y in e then
return P (y|P arents(Y )) × Enumerate All(REST (vars, e))
else P
return y P (y|P arents(Y )) × Enumerate All(Rest(vars), ey ) where ey is e extended with Y = y
end
end
Algorithm 1: The enumeration algorithm
See README and type help func name in Matlab (or read corresponding scripts) for documentations
and examples. Note that the recursive enumeration must be performed from parents to children, i.e.
the list of variables must be partially ordered such that parents are always before their children. The
function bn vars will provide the ordered variables list(actually the variable index itself in the Bayesian
networks given below are already ordered this way, so the ordered list is just 1,2,. . .,N). Please write
your code by modifying the provided file enumeration ask.m, which contains suggested API with
documentation. Matlab is STRONGLY recommended, particularly because writing necessary support
code in other languages is time consuming.
5
2. Run your exact inference implementation on the following two Bayesian network inference problems.
It should be straightforward for you to convert the question below into a function call to your code
(by using appropriate variable indexes labeled on the graph below); type “help enumeration ask” for
an example. Both questions ask the conditional probability of one query variable being true given a
set of evidence variables. Report run time and results.
(a) The first Bayesian network, created by create alarm bn, is the alarm network in Russell and
Norvig. As shown in Fig. 4, variable are indexed from 1 to 5 (denoted as X1 to X5), and CPT are
the same as in Russell and Norvig figure 14.2. Evidence variables are shaded, while query variables
are shaded and circled by a thick line. Calculate the conditional probability p(X1|X4,¬X5). Hint:
verify with some queries that you know the answer first, e.g. p(X1|X4,X5) should be about 0.284.
(b) Solution: p(X1|X4,X5) = 0.0051
(c) The second Bayesian network, created by create pedigree bn, is about genetic inference. Consider
a victim V in a plane crash, whose only family members are his half-sister S and the sister’s
mother M (not V’s mother). Their pedigree is shown below. You need to determine whether
certain remains belong to V based on genetic fingerprints of S and M. This can be solved by a
Bayesian network shown in Fig. 5, indexed from 1 to 11. Evidence and query variables are shaded,
while normal circles are hidden variables. You do not need to worry about the CPT if you are
using Matlab; otherwise the CPT is explained in the documentation of create pedigree bn.m The
variables (X1,X2,. . . ,X8) correspond to unobserved genetic information in the so-called Mendelian
inheritance: humans have two copies of each chromosome, one from the father and one from the
6
mother. During reproduction, one copy (chosen randomly) will be passed to the next generation.
Assume you cannot determine which copy is from which parent, but can only obtain partial
information in the observed variables (X9,X10,X11). However, you do not have to understand
Mendelian inheritance to solve this problem. Now using the structure and CPT provided in the
archive, calculate the conditional probability p(X10| ¬ X9,X11).
(d) Solution: p(X1|X4,X5) = 0.0407