Probability
Probability
Required Textbook - DeGroot & Schervish, Probability and Statistics, Third Edition Recommended Introduction to Probability Text - Feller, Vol. 1
1.2-1.4. Probability, Set Operations. What is probability? Classical Interpretation: all outcomes have equal probability (coin, dice) Subjective Interpretation (nature of problem): uses a model, randomness involved (such as weather) ex. drop of paint falls into a glass of water, model can describe P(hit bottom before sides) or, P(survival after surgery)- subjective, estimated by the doctor. Frequency Interpretation: probability based on history P(make a free shot) is based on history of shots made. Experiment has a random outcome. 1. Sample Space - set of all possible outcomes. coin: S={H, T}, die: S={1, 2, 3, 4, 5, 6} two dice: S={(i, j), i, j=1, 2, ..., 6}
2. Events - any subset of sample space ex. A S, A - collection of all events. 3. Probability Distribution - P: A [0, 1] Event A S, P(A) or Pr(A) - probability of A Properties of Probability: 1. 0 P(A) 1 2. P(S) = 1 3. For disjoint (mutually exclusive) events A, B (denition A B = ) P(A or B) = P(A) + P(B) - this can be written for any number of events. For a sequence of events A1 , ..., An , ... all disjoint (Ai Aj = , i = j): P(
Ai ) =
i=1
i=1
P(Ai )
Need to group outcomes, not sum up individual points since they all have P = 0.
Union of Sets: A B = {s S : s A or s B}
Intersection: A B = AB = {s S : s A and s B}
Complement: Ac = {s S : s A} /
Symmetric Dierence: (A B c ) (B Ac ) Summary of Set Operations: 1. Union of Sets: A B = {s S : s A or s B} 2. Intersection: A B = AB = {s S : s A and s B } 3. Complement: Ac = {s S : s A} / 4. Set Dierence: A \ B = A B = {s S : s A and s B} = A B c / 5. Symmetric Dierence:
/ /
AB = {s S : (s A and s B ) or (s B and s A)} = (A B c ) (B Ac ) Properties of Set Operations: 1. A B = B A 2. (A B) C = A (B C) Note that 1. and 2. are also valid for intersections. 3. For mixed operations, associativity matters:
(A B) C = (A C) (B C)
think of union as addition and intersection as multiplication: (A+B)C = AC + BC
4. (A B)c = Ac B c - Can be proven by diagram below:
Both diagrams give the same shaded area of intersection. 5. (A B)c = Ac B c - Prove by looking at a particular point: s (A B)c = s (A B) / s A or s B = s Ac or s B c / / s (Ac B c ) QED ** End of Lecture 1
1.5 Properties of Probability. 1. P(A) [0, 1] 2. P(S) = 1 3. P(Ai ) = P (Ai ) if disjoint Ai Aj = , i = j The probability of a union of disjoint events is the sum of their probabilities. 4. P(), P(S) = P(S ) = P(S) + P() = 1
where S and are disjoint by denition, P(S) = 1 by #2., therefore, P() = 0.
5. P(Ac ) = 1 P(A)
because A, Ac are disjoint, P(A Ac ) = P(S) = 1 = P(A) + P(Ac )
the sum of the probabilities of an event and its complement is 1. 6. If A B, P(A) P(B)
by denition, B = A (B \ A), two disjoint sets.
P(B) = P(A) + P(B \ A) P(A)
7. P(A B) = P(A) + P(B) P(AB)
must subtract out intersection because it would be counted twice, as shown:
Finite Sample Spaces There are a nite # of outcomes S = {s1 , ..., sn } Dene pi = P(si ) as the probability function.
pi 0,
n i=1
pi = 1 P(s)
P(A) =
sA
Classical, simple sample spaces - all outcomes have equal probabilities. P(A) = #(A) , by counting methods. #(S) Multiplication rule: #(s1 ) = m, #(s2 ) = n, #(s1 s2 ) = mn Sampling without replacement: one at a time, order is important
s1 ...sn outcomes
k n (k chosen from n)
#(outcome vectors) = (a1 , a2 , ..., ak ) = n(n 1) ... (n k + 1) = Pn,k
Example: order the numbers 1, 2, and 3 in groups of 2. (1, 2) and (2, 1) are dierent.
P3,2 = 3 2 = 6
Pn,n = n(n 1) ... 1 = n!
Pn,k = n! (n k)!
each set can be ordered k! ways, so divide that out of Pn,k Cn,k - binomial coecients Binomial Theorem: (x + y)n =
n n
Sampling without replacement, k at once s1 ...sn sample a subset of size k, b1 ...bk , if we arent concerned with order. n n! number of subsets = Cn,k = = k k!(n k)!
k=0
xk y nk
There are
n
k
Fix the outer walls, rearrange the balls and the separators. If you x the outer walls of the rst and last boxes,
you can rearrange the separators and the balls using the binomial theorem.
There are n balls and k-1 separators (k boxes).
Number of dierent ways to arrange the balls and separators =
n+k1 n+k1 = n k1 Example: f (x1 , x2 , ..., xk ), take n partial derivatives: nf 2 x1 x2 5 x3 ...xk k boxes k coordinates
n balls n partial derivatives
number of dierent partial derivatives = n+k1 = n+k1 n k1
** End of Lecture 2.
1.9 Multinomial Coecients These values are used to split objects into groups of various sizes.
s1 , s2 , ..., sn - n elements such that n1 in group 1, n2 in group 2, ..., nk in group k.
n1 + ... + nk = n
n n n1 n n1 n2 n n1 ... nk2 nk ... n2 n3 nk1 nk n1 = (n n1 )! (n n1 n2 )! (n n1 ... nk2 )! n! 1 ... n1 !(n n1 )! n2 !(n n1 n2 )! n3 !(n n1 n2 n3 )! nk1 !(n n1 ... nk1 )! = n! = n1 !n2 !...nk1 !nk ! n n1 , n2 , ..., nk
Further explanation: You have n spots in which you have n! ways to place your elements.
However, you can permute the elements within a particular group and the splitting is still the same.
You must therefore divide out these internal permutations.
This is a distinguishable permutations situation.
Example #1 - 20 members of a club need to be split into 3 committees (A, B, C) of 8, 8, and 4 people,
respectively. How many ways are there to split the club into these committees?
20 20! ways to split = = 8!8!4! 8, 8, 4 Example #2 - When rolling 12 dice, what is the probability that 6 pairs are thrown?
This can be thought of as each number appears twice
There are 612 possibilities for the dice throws, as each of the 12 dice has 6 possible values.
In pairs, the only freedom is where the dice show up.
12! 12! 12 = P= = 0.0034 (2!)6 612 2, 2, 2, 2, 2, 2 (2!)6
P(A B C) = P(A) + P(B) + P(C) P(AB) P(BC) P(AC) + P(ABC) 1.10 - Calculating a Union of Events - P(union of events)
P(A B) = P(A) + P(B) P(AB) (Figure 1)
P(A B C) = P(A) + P(B) + P(C) P(AB) P(BC) P(AC) + P(ABC) (Figure 2)
Theorem:
P(
i=1
Ai ) =
in
P(Ai )
i<j
P(Ai Aj ) +
i<j<k
Express each disjoint piece, then add them up according to what sets each piece
belongs or doesnt belong to.
A1 ... An can be split into a disjoint partition of sets:
Ai ) =
i=1
P(disjoint partition)
To check if the theorem is correct, see how many times each partition is counted.
P(A1 ), P(A2 ), ..., P(Ak ) - k times
k i<j P(Ai Aj ) 2 times
(needs to contain Ai and Aj in k dierent intersections.) Example: Consider the piece A B C c , as shown:
This piece is counted: P(A B C) = once. P(A) + P(B) + P(C) = counted twice.
P(AB) P(AC) P(BC) = subtracted once.
+P(ABC) = counted zero times.
The sum: 2 - 1 + 0 = 1, piece only counted once.
Example: Consider the piece A1 A2 A3 Ac 4 k = 3, n = 4.
P(A1 ) + P(A2 ) + P(A3 ) + P(A4 ) = counted k times (3 times).
P(A1 A2 ) P(A1 A3 ) P(A1 A4 ) P(A2 A3 ) P(A2 A4 ) P(A3 A4 ) = counted k times (3 times).
2
k as follows: i<j<k = counted 3 times (1 time). k total in general: k k + k k + ... + (1)k+1 k = sum of times counted. 2 3 4 To simplify, this is a binomial situation.
0 = (1 1) =
k k i=0
(1) (1)
(ki)
k k k k = + ... 0 1 2 3
0 = 1 sum of times counted therefore, all disjoint pieces are counted once.
** End of Lecture 3
10
P(Ai )
i<j
P(Ai Aj ) +
i<j<k
P(Ai Aj Ak ) + ...
1 1 1 + ... + (1)n+1 2! 3! n!
3
Recall: Taylor series for ex = 1 + x + x + x + ... 2! 3! 1 for x= -1, e1 = 1 1 + 1 3! + ... 2 therefore, SUM = 1 - limit of Taylor series as n When n is large, the probability converges to 1 e1 = 0.63 2.1 - Conditional Probability Given that B happened, what is the probability that A also happened? The sample space is narrowed down to the space where B has occurred:
The sample size now only includes the determination that event B happened. Denition: Conditional probability of Event A given Event B: P(A|B) = P(AB) P(B)
It is sometimes easier to calculate intersection given conditional probability: P(AB) = P(A|B)P(B) Example: Roll 2 dice, sum (T) is odd. Find P(T < 8). B = {T is odd}, A = {T < 8} P(A|B) = P(AB) 18 1 , P(B) = 2 = P(B) 6 2
Example, considering Placebo: B = Placebo, A = Relapse. P(A|B) = 13 Example, considering treatment B: P(A|B) = 13+25 = 0.34
= 0.7
13
2.2 Independence of events. P(A|B) = P(AB) ; P(B) Denition - A and B are independent if P(A|B) = P(A) P(A|B) = P(AB) = P(A) P(AB) = P(A)P(B) P(B)
Experiments can be physically independent (roll 1 die, then roll another die),
or seem physically related and still be independent.
Example: A = {odd}, B = {1, 2, 3, 4}. Related events, but independent.
2 1 P(A) = 2 .P(B ) = 3 .AB = {1, 3} 1 2 1 P(AB) = 2 3 = P(AB ) = 3 , therefore independent. Independence does not imply that the sets do not intersect.
Example: Toss an unfair coin twice, these are independent events. P(H) = p, 0 p 1, nd P(T H ) = tails rst, heads second P(T H ) = P(T )P(H) = (1 p)p Since this is an unfair coin, the probability is not just 1 4 TH If fair, HH+HT +T H+T T = 1 4 If you have several events: A1 , A2 , ...An that you need to prove independent:
It is necessary to show that any subset is independent.
Total subsets: Ai1 , Ai2 , ..., Aik , 2 k n
Prove: P(Ai1 Ai2 ...Aik ) = P(Ai1 )P(Ai2 )...P(Aik )
You could prove that any 2 events are independent, which is called pairwise independence,
but this is not sucient to prove that all events are independent.
Example of pairwise independence:
Consider a tetrahedral die, equally weighted.
Three of the faces are each colored red, blue, and green,
but the last face is multicolored, containing red, blue and green.
P(red) = 2/4 = 1/2 = P(blue) = P(green)
P(red and blue) = 1/4 = 1/2 1/2 = P(red)P(blue)
Therefore, the pair {red, blue} is independent.
The same can be proven for {red, green} and {blue, green}.
but, what about all three together?
P(red, blue, and green) = 1/4 = P(red)P(blue)P(green) = 1/8, not fully independent.
Example: P(H) = p, P(T ) = 1 p for unfair coin
Toss the coin 5 times P(HTHTT)
= P(H)P(T )P(H)P(T )P(T )
= p(1 p)p(1 p)(1 p) = p2 (1 p)3
Example: Find P(get 2H and 3T, in any order)
= sum of probabilities for ordering
= P(HHT T T ) + P(HT HT T ) = ...
= p2 p)3 + p2 (1 p)3 + ...
(1 = 5 p2 (1 p)3
2
Example: Toss a coin until the result is heads; there are n tosses before H results.
P(number of tosses = n) =?
needs to result as TTT....TH, number of Ts = (n - 1)
P(tosses = n) = P(T T...H) = (1 p)n1 p
Example: In a criminal case, witnesses give a specic description of the couple seen eeing the scene.
P(random couple meets description) = 8.3 108 = p
We know at the beginning that 1 couple exists. Perhaps a better question to be asked is:
Given a couple exists, what is the probability that another couple ts the same description?
P(2 couples exists)
A = P(at least 1 couple), B = P(at least 2 couples), nd P(B |A)
P(B |A) = P(BA) = P(B) P(A) P(A) 15
If n = 8 million people, P(B |A) = 0.2966, which is within reasonable doubt! P(2 couples) < P(1 couple), but given that 1 couple exists, the probability that 2 exist is not insignicant.
In the large sample space, the probability that B occurs when we know that A occured is signicant! 2.3 Bayess Theorem It is sometimes useful to separate a sample space S into a set of disjoint partitions:
B1 , ..., Bk - a partition of sample space S. k Bi Bj = , for i = j, S = i=1 Bi (disjoint) k k Total probability: P(A) = i=1 P(ABi ) = i=1 P(A|Bi )P(Bi ) k (all ABi are disjoint, i=1 ABi = A)
** End of Lecture 5
16
Solutions to Problem Set #1 1-1 pg. 12 #9 Bn = i=n Ai , Cn = i=n Ai a) Bn Bn+1 ... Bn = An ( i=n+1 Ai ) = An Bn+1 s Bn+1 s Bn+1 An = Bn Cn Cn+1 ... Cn = An Cn+1 s Cn = An Cn+1 s Cn+1 b) n=1 Bn s Bn for all n s s i=1 Ai for all n s some Ai for i n, for all n s innitely many events Ai i happen innitely often.
A c) s n=1 Cn s some Cn = i=n Ai for some n, s all Ai for i n
s all events starting at n. 1-2 pg. 18 #4 P (at least 1 fails) = 1 P (neither fail) = 1 0.4 = 0.6 1-3 pg. 18 #12 A1 , A2 , ... B1 A1 , B2 = c A2 , ..., Bn = Ac ...Ac An = A1 1 n1 n n P ( i=1 Ai ) = i=1 P (B splits the union into disjoint events, and covers the entire space. i) follows from: n Ai = n Bi i=1 i=1 n take point (s) in i=1
Ai , s at least one s A1 = B1 , if not, s Ac , if s A2 , then s Ac
A2 = B2 , if not... etc. 1 1 at some point, the point belongs to a set. The sequence stops when s Ac Ac ... Ac Ak = Bk 1 2 k1 n s n Bi .P ( n Ai ) = P ( i=1 Bi ) i=1 i=1 n = i=1 P (Bi ) if Bi s are disjoint. Should also prove that the point in Bi belongs in Ai . Need to prove Bi s disjoint - by construction: Bi , Bj Bi = Ac ... Ac Ai
i1 i Bj = Ac ... Ac ... Ac
Aj 1 j1 i s B i s A i , s B j s A i . / implies that s = s 1-4 pg. 27 #5 #(S) = 6 6 6 6 = 64 #(all dierent) = 6 5 4 3 = P6,4 P6,4 5 P (all dierent) = 64 = 18 1-5 pg. 27 #7
12 balls in 20 boxes.
P(no box receives > 1 ball, each box will have 0 or 1 balls)
also means that all balls fall into dierent boxes.
#(S) = 2012
#(all dierent) = 20 19... 9 = P20,12
17
P (...) =
P20,12 2012
(98) 8 (100) 8
18
n + (r n) 1 rn
r1 rn
19
Bayes Formula.
Partition B1 , ..., Bk k = i=1 Bi S, Bi Bj = for i = j k P(A) = k P(ABi ) = i=1 P(A|Bi )P(Bi ) - total probability. i=1
Example: In box 1, there are 60 short bolts and 40 long bolts. In box 2,
there are 10 short bolts and 20 long bolts. Take a box at random, and pick a bolt.
What is the probability that you chose a short bolt?
B1 = choose Box 1.
B2 = choose Box 2.
60 P(short) = P(short|B1 )P(B1 ) + P(short|B2 )P(B2 ) = 100 ( 1 ) + 10 ( 1 )
2 30 2 Example:
Partitions: B1 , B2 , ...Bk and you know the distribution.
Events: A, A, ..., A and you know the P(A) for each Bi
If you know that A happened, what is the probability that it came from a particular B i ?
P(Bi |A) = P(Bi A) P(A|Bi )P(Bi ) = : Bayess Formula P(A) P(A|B1 )P(B1 ) + ... + P(A|Bk )P(Bk )
The probability is still very small that you actually have the disease.
20
Example: A gene has 2 alleles: A, a. The gene exhibits itself through a trait with two versions.
The possible phenotypes are dominant, with genotypes AA or Aa, and recessive, with genotype aa.
Alleles travel independently, derived from a parents genotype.
In a population, the probability of having a particular allele: P(A) = 0.5, P(a) = 0.5
Therefore, the probabilities of the genotypes are: P(AA) = 0.25, P(Aa) = 0.5, P(aa) = 0.25
Partitions: genotypes of parents: (AA, AA), (AA, Aa), (AA, aa), (Aa, Aa), (Aa, aa), (aa, aa).
Assume pairs match regardless of genotype.
Parent genotypes
(AA, AA)
(AA, Aa)
(AA, aa)
(Aa, Aa)
(Aa, aa)
(aa, aa)
Probabilities 2 ( 1 )( 1 ) = 4 2 2 ( 1 )( 1 ) = 4 4 1 ( 1 )( 1 ) = 4 2 2 2 ( 1 )( 1 ) = 2 4
1 16 1 16 1 4 1 8 1 4
If you see that a person has dark hair, predict the genotypes of the parents: P ((AA, AA)|A) =
1 16 (1) 1 4 (1) 1 8 (1) 1 1 16 1 3 + 4(4) + 1(1) + 4 2 1 16 (0)
1 12
You can do the same computation to nd the probabilities of each type of couple. Bayess formula gives a prediction inside the parents that you arent able to directly see. Example: You have 1 machine.
In good condition: defective items only produced 1% of the time. P(in good condition) = 90%
In broken condition: defective items produced 40% of the time. P(broken) = 10%
Sample 6 items, and nd that 2 are defective. Is the machine broken?
This is very similar to the medical example worked earlier in lecture:
P(good|2 out of 6 are defective) =
= P (2of 6|good)P (good) P (2of 6|good)P (good) + P (2of 6|broken)P (broken) 6 2 4 2 (0.01) (0.99) (0.9) 6 = 6 = 0.04 2 4 2 4 2 (0.01) (0.99) (0.9) + 2 (0.4) (0.6) (0.1)
** End of Lecture 7
21
3.1 - Random Variables and Distributions Transforms the outcome of an experiment into a number.
Denitions:
Probability Space: (S, A, P)
S - sample space, A - events, P - probability
Random variable is a function on S with values in real numbers, X:S R
Examples:
Toss a coin 10 times, Sample Space = {HTH...HT, ....}, all congurations of H & T.
Random Variable X = number of heads, X: S R
X: S {0, 1, ..., 10} for this example.
There are fewer outcomes than in S, you need to give the distribution of the
random variable in order to get the entire picture. Probabilities are therefore given.
Denition: The distribution of a random variable X:S R, is dened by: A R, P(A) = P(X A) = P(s S : X(s) A)
Example: Uniform distribution of a nite number of values {1, 2, 3, ..., n} each outcome 22
1 has equal probability f (sk ) = n : uniform probability function. random variable X R, P(A) = P(X A), A R
can redene probability space on random variable distribution:
(R, A, P) - sample space, X: R R, X(x) = x (identity map)
P(A) = P(X : X(x) A) = P(x A) = P(x A) = P(A)
all you need is the outcomes mapped to real numbers and relative probabilities
of the mapped outcomes.
Example: In a uniform distribution [a, b], denoted U[a, b]: 1 p.d.f.: f (x) = ba , for x [a, b]; 0, for x [a, b] / Example: On an interval [a, b], such that a < c < d < b, d 1 P([c, d]) = c ba dx = dc (probability on a subinterval) ba Example: Exponential Distribution
23
1 ex dx = ( ex | = 1 0 Real world: Exponential distribution describes the life span of quality products (electronics). 0
** End of Lecture 8
24
Discrete Random Variable: - dened by probability function (p.f.) {s1 , s2 , ...}, f (si ) = P(X = si ) Continuous: probability distribution function (p.d.f.) - also called density function. f (x) 0, f (x)dx, P(X A) = A f (x)dx
P(X x < 0) = 0 1 1 P(X 0) = P(X = 0) = 2 , P(X x) = P(X = 0) = 2 , x [0, 1) P(X x) = P(X = 0 or 1) = 1, x [1, ) 3. right continuous: limyx+ F (y) = F (x) F (y) = P(X y), event {X y}
n=1
Probability of random variable occuring within interval: P(x1 < X < x2 ) = P({X x2 }\{X x1 }) = P(X x2 ) P(X x1 ) = F (x2 ) F (x1 )
25
{X x2 } {X x1 } Probability of a point x, P(X = x) = F (x) F (x ) where F (x ) = limxx F (x), F (x+ ) = limxx+ F (x)
If continuous, probability at a point is equal to 0, unless there is a jump,
where the probability is the value of the jump.
P(x1 X x2 ) = F (x2 ) F (x
) 1 P(A) = P(X A)
X - random variable with distribution P
When observing a c.d.f:
Discrete: sum of probabilities at all the jumps = 1. Graph is horizontal in between the jumps, meaning that probability = 0 in those intervals.
26
If f continuous, f (x) = F (x) Quantile: p [0, 1], p-quantile = inf {x : F (x) = P(X x) p} nd the smallest point such that the probability up to the point is at least p.
The area underneath F(x) up to this point x is equal to p.
If the 0.25 quantile is at x = 0, P(X 0) 0.25
Note that if disjoint, the 0.25 quantile is at x = 0, but so is the 0.3, 0.4...all the way up to 0.5. What if you have 2 random variables? multiple?
ex. take a person, measure weight and height. Separate behavior tells you nothing
about the pairing, need to describe the joint distribution.
Consider a pair of random variables (X, Y)
Joint distribution of (X, Y): P((X, Y ) A)
Event, set A R2
27
Discrete distribution: {(s1 , s2 ), (s1 , s2 ), ...} (X, Y ) 1 1 2 2 Joint p.f.: f (s1 , s2 ) = P((X, Y ) = (s1 , s2 )) 1 i i i = P(X = s1 , Y = s2 ) i i Often visualized as a table, assign probability for each point: 0 0.1 0 0.2 -1 0 0 0 -2.5 0.2 0 0.4 5 0 0.1 0
f (x, y)dxdy =
Joint p.d.f. f (x, y) : P((X, Y ) A) = A f (x, y)dxdy Joint c.d.f. F (x, y) = P(X x, Y y)
R2
f (x, y)dxdy = 1
Continuous: F (x, y) =
x y
2F xy
= f (x, y )
** End of Lecture 9
28
x y In the continuous case: F (x, y) = P(X x, T y) = f (x, y)dxdy. Marginal Distributions Given the joint distribution of (X, Y), the individual distributions of X, Y
are marginal distributions.
Discrete (X, Y): marginal probability function
f1 (x) = P(X = x) = y P(X = x, Y = y) = y f (x, y)
In the table for the previous lecture, of probabilities for each point (x, y):
Add up all values for y in the row x = 1 to determine P(X = 1)
Continuous (X, Y): joint p.d.f. f(x, y); p.d.f. of X: f1 (x) = f (x, y )dy x F (x) = P(X x) = P(X x, Y ) = f (x, y )dydx
Review of Distribution Types Discrete distribution for (X, Y): joint p.f. f (x, y) = P(X = x, Y = y) Continuous: joint p.d.f. f (x, y) 0, R2 f (x, y)dxdy = 1 Joint c.d.f.: F (x, y) = P(X x, Y y) F (x) = P(X x) = limy F (x, y)
29
21 2 8 x (1
x4 ), 1 x 1
Discrete values for X, Y in tabular form: 1 2 1 0.5 0 0.5 2 0 0.5 0.5 0.5 0.5 Note: If all entries had 0.25 values, the two variables would have the same marginal dist. Independent X and Y: Denition: X, Y independent if P(X A, Y B) = P(X A)P(Y B)
Joint c.d.f. F (x, y) = P(X x, Y y) = P(X x)P(Y y) = F1 (x)F2 (y) (intersection of events)
The joint c.d.f can be factored for independent random variables.
Implication: continuous (X, Y): joint p.d.f. f(x, y), marginal f1 (x), f2 (y) x y y x F (x, y) = f (x, y)dydx = F1 (x)F2 (y) = f1 (x)dx f2 (y)dy
2
Take xy of both sides: f (x, y) = f1 (x)f2 (y) Independent if joint density is a product.
30
P(square) = 0 = P(X side) P(Y side) Example: f (x, y) = kx2 y 2 , 0 x 1, 0 y 1; 0 otherwise Can be written as a product, as they are independent:
f (x, y) = kx2 y 2 I(0 x 1, 0 y 1) = k1 x2 I(0 x 1) k2 y 2 I(0 y 1)
Conditions on x and y can be separated.
Note: Indicator Notation
/
I(x A) = 1, x A; 0, x A For the discrete case, given a table of values, you can tell independence: b1 p11 ... ... pn1 p+1 b2 p12 ... ... ... p+2 ... ... ... ... ... ... bm p1m ... ... pnm p+n
a1 a2 ... an
pij = P(X = ai , Y = bj ) = P(X = ai )P(Y = bj ) m pi+ = P(X = ai ) = j=1 pij n p+j = P(Y = bj ) = i=1 pij pij = pi+ p+j , for every i, j - all points in table. ** End of Lecture 10
31