Random Variables
Random Variables
Distributions
Random Variable
• A random variable x takes on a defined set of
values with different probabilities.
• For example, if you roll a die, the outcome is random
(not fixed) and there are 6 possible outcomes, each of
which occur with probability one-sixth.
• For example, if you poll people about their voting
preferences, the percentage of the sample that responds
“Yes on Proposition 100” is a also a random variable (the
percentage will be slightly differently every time you
poll).
p(x)
1/6
x
1 2 3 4 5 6
P(x) = 1
all x
Probability mass function (pmf)
x p(x)
1 p(x=1)=1/6
2 p(x=2)=1/6
3 p(x=3)=1/6
4 p(x=4)=1/6
5 p(x=5)=1/6
6 p(x=6)=1/6
1.0
Cumulative distribution function
(CDF)
1.0 P(x)
5/6
2/3
1/2
1/3
1/6
1 2 3 4 5 6 x
Cumulative distribution
function
x P(x≤A)
1 P(x≤1)=1/6
2 P(x≤2)=2/6
3 P(x≤3)=3/6
4 P(x≤4)=4/6
5 P(x≤5)=5/6
6 P(x≤6)=6/6
Practice Problem:
◼ The number of patients seen in the ER in any given hour is a
random variable represented by x. The probability distribution
for x is:
x 10 11 12 13 14
P(x) .4 .2 .2 .1 .1
a. 1/6
b. 1/3
c. 1/2
d. 5/6
e. 1.0
Review Question 1
If you toss a die, what’s the probability that you
roll a 3 or less?
a. 1/6
b. 1/3
c. 1/2
d. 5/6
e. 1.0
Review Question 2
Two dice are rolled and the sum of the face
values is six? What is the probability that at
least one of the dice came up a 3?
a. 1/5
b. 2/3
c. 1/2
d. 5/6
e. 1.0
Review Question 2
Two dice are rolled and the sum of the face
values is six. What is the probability that at least
one of the dice came up a 3?
e
−x −x
= −e = 0 +1 =1
0
0
Continuous case: “probability
density function” (pdf)
p(x)=e-x
x
1 2
2 2
−x −x
P(1 x 2) = e = −e = −e −2 − −e −1 = −.135 + .368 = .23
1
1
Example 2: Uniform
distribution
The uniform distribution: all values are equally likely.
f(x)= 1 , for 1 x 0
p(x)
x
1
1 = x
0
0
=1− 0 =1
Example: Uniform distribution
What’s the probability that x is between 0 and ½?
E( X ) = x p(x )
all x
i i
Continuous case:
E( X ) =
all x
xi p(xi )dx
Symbol Interlude
◼ E(X) = µ
◼ these symbols are used interchangeably
Example: expected value
x 10 11 12 13 14
P(x) .4 .2 .2 .1 .1
x i n
i =1 1
X= = xi ( )
n i =1 n
1 1 1 “49 choose 6”
= = = 7.2 x 10 -8
49 49! 13,983 ,816
Out of 49 numbers,
6 43!6!
this is the number
of distinct
combinations of 6.
The probability function (note, sums to 1.0):
x$ p(x)
-1 .999999928
Expected Value
E(X) = P(win)*$2,000,000 + P(lose)*-$1.00
= 2.0 x 106 * 7.2 x 10-8+ .999999928 (-1) = .144 - .999999928 = -$.86
A roulette wheel has the numbers 1 through 36, as well as 0 and 00.
If you bet $1 that an odd number comes up, you win or lose $1
according to whether or not that event occurs. If random variable X
denotes your net gain, X=1 with probability 18/38 and X= -1 with
probability 20/38.
On average, the casino wins (and the player loses) 5 cents per game.
If the cost is $10 per game, the casino wins an average of 53 cents per
game. If 10,000 games are played in a night, that’s a cool $5300.
Expected value isn’t
everything though…
◼ Take the hit new show “Deal or No Deal”
◼ Everyone know the rules?
◼ Let’s say you are down to two cases left. $1
and $400,000. The banker offers you
$200,000.
◼ So, Deal or No Deal?
Deal or No Deal…
◼ This could really be represented as a
probability distribution and a non-
random variable:
x$ p(x)
+1 .50
+$400,000 .50
x$ p(x)
+$200,000 1.0
Expected value doesn’t help…
x$ p(x)
+1 .50
+$400,000 .50
x$ p(x)
+$200,000 1.0
= E ( X ) = 200 ,000
How to decide?
Variance!
• If you take the deal, the variance/standard
deviation is 0.
•If you don’t take the deal, what is average
deviation from the mean?
•What’s your gut guess?
Variance/standard deviation
2=Var(x) =E(x-)2
Var ( X ) = (x
all x
i − ) p(xi )
2
Continuous case?:
all x
Symbol Interlude
◼ Var(X)= 2
◼ SD(X) =
◼ these symbols are used interchangeably
Similarity to empirical variance
( xi − x ) 2 N
i =1 1
= ( xi − x ) (2
)
n −1 i =1 n −1
=
2
(x
all x
i − ) p(xi )
2
2 = ( xi − ) 2 p(xi ) =
all x
= .997 = .99
Standard deviation is $.99. Interpretation: On average, you’re
either 1 dollar above or 1 dollar below the mean, which is just
under zero. Makes sense!
Review Question 3
The expected value and variance of a coin
toss (H=1, T=0) are?
a. .50, .50
b. .50, .25
c. .25, .50
d. .25, .25
Review Question 3
The expected value and variance of a
coin toss are?
a. .50, .50
b. .50, .25
c. .25, .50
d. .25, .25
Important discrete probability
distribution: The binomial
Binomial Probability
Distribution
◼ A fixed number of observations (trials), n
◼ e.g., 15 tosses of a coin; 20 patients; 1000 people
surveyed
◼ A binary outcome
◼ e.g., head or tail in each toss of a coin; disease or no
disease
◼ Generally called “success” and “failure”
◼ Probability of success is p, probability of failure is 1 – p
◼ Constant probability for each observation
◼ e.g., Probability of getting a tail is the same each time
we toss the coin
Binomial distribution
Take the example of 5 coin tosses.
What’s the probability that you flip
exactly 3 heads in 5 coin tosses?
Binomial distribution
Solution:
One way to get exactly 3 heads: HHHTT
5C3 = 5!/3!2! = 10
10 x (½)5=31.25%
Binomial distribution
function:
X= the number of heads tossed in 5 coin
tosses
p(x)
x
0 1 2 3 4 5
number of heads
Binomial distribution, generally
Note the general pattern emerging → if you have only two possible
outcomes (call them 1/0 or yes/no or success/failure) in n independent
trials, then the probability of exactly X “successes”=
n = number of trials
n X n− X
p (1 − p)
X 1-p = probability
of failure
X=# p=
successes probability of
out of n success
trials
Binomial distribution: example
20 10 10
(.5) (.5) = .176
10
Binomial distribution: example
◼ If I toss a coin 20 times, what’s the
probability of getting of getting 2 or
fewer heads?
20 20!
(.5) (.5) =
0 20
(.5) 20 = 9.5 x10 −7 +
0 20!0!
20 20!
(. 5)1
(. 5)19
= (.5) 20 = 20 x9.5 x10 − 7 = 1.9 x10 −5 +
1 19!1!
20 20!
(.5) (.5) =
2 18
(.5) 20 = 190 x9.5 x10 −7 = 1.8 x10 − 4
2 18!2!
= 1.8 x10 − 4
**All probability distributions are
characterized by an expected value and a
variance:
0 1 2 10
Practice Problem:
You are conducting a case-control study of
smoking and lung cancer. If the probability of
being a smoker among lung cancer cases is .6,
what’s the probability that in a group of 8 cases
you have:
0 1 2 3 4 5 6 7 8
Answer, continued
0 1 2 3 4 5 6 7 8
a. 2.5
b. 13.5
c. 15.0
d. 6.0
e. .05
Review Question 4
In your case-control study of smoking and
lung-cancer, 60% of cases are smokers versus
only 10% of controls. What is the odds ratio
between smoking and lung cancer?
a. 2.5
b. 13.5 .6
c. 15.0 .4 = 3 x 9 = 27 = 13 .5
d. 6.0 .1 2 1 2
e. .05 .9
Review Question 5
What’s the probability of getting exactly 5
heads in 10 coin tosses?
10
(.50 ) (.50 )
5 5
a. 0
b. (.50 ) 5 (.50) 5
10
5
c. 10
(.50) (.50)
10 5
5
d. 10
(.50) (.50)
10 0
10
Review Question 5
What’s the probability of getting exactly 5
heads in 10 coin tosses?
10
(.50 ) (.50 )
5 5
a. 0
b. (.50 ) 5 (.50) 5
10
5
c. 10
(.50) (.50)
10 5
5
d. 10
(.50) (.50)
10 0
10
Review Question 6
A coin toss can be thought of as an example of
a binomial distribution with N=1 and p=.5.
What are the expected value and variance of a
coin toss?
a. .5, .25
b. 1.0, 1.0
c. 1.5, .5
d. .25, .5
e. .5, .5
Review Question 6
A coin toss can be thought of as an example of
a binomial distribution with N=1 and p=.5.
What are the expected value and variance of a
coin toss?
a. .5, .25
b. 1.0, 1.0
c. 1.5, .5
d. .25, .5
e. .5, .5
Review Question 7
If I toss a coin 10 times, what is the expected
value and variance of the number of heads?
a. 5, 5
b. 10, 5
c. 2.5, 5
d. 5, 2.5
e. 2.5, 10
Review Question 7
If I toss a coin 10 times, what is the expected
value and variance of the number of heads?
a. 5, 5
b. 10, 5
c. 2.5, 5
d. 5, 2.5
e. 2.5, 10
Review Question 8
In a randomized trial with n=150, the goal is to
randomize half to treatment and half to
control. The number of people randomized to
treatment is a random variable X. What is the
probability distribution of X?
a. X~Normal(=75,=10)
b. X~Exponential(=75)
c. X~Uniform
d. X~Binomial(N=150, p=.5)
e. X~Binomial(N=75, p=.5)
Review Question 8
In a randomized trial with n=150, every
subject has a 50% chance of being randomized
to treatment. The number of people
randomized to treatment is a random variable
X. What is the probability distribution of X?
a. X~Normal(=75,=10)
b. X~Exponential(=75)
c. X~Uniform
d. X~Binomial(N=150, p=.5)
e. X~Binomial(N=75, p=.5)
Review Question 9
In the same RCT with n=150, if 69
end up in the treatment group and 81
in the control group, how far off is
that from expected?
x = np(1 − p)
Differs
by a
factor
pˆ = p of n.
For proportion:
np(1 − p) p(1 − p)
pˆ 2 = 2
=
n n
P-hat stands for “sample p(1 − p)
proportion.”
pˆ =
n
It all comes back to normal…
◼ Statistics for proportions are based on a
normal distribution, because the
binomial can be approximated as
normal if np>5