Section 1: Introduction and Probability Concepts

Section 1: Introduction and Probability Concepts
Carlos M. Carvalho
The University of Texas at Austin
McCombs School of Business
http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/
1
Getting Started
I Syllabus
I General Expectations
1. Read the notes
2. Work on homework assignments
3. Be on schedule
2
Course Overview
Section 1: Introduction and Probability Concepts
Section 2: Learning from Data: Estimation, Confidence Intervals and

Testing Hypothesis
Section 3: Simple Linear Regression
Section 4: Multiple Linear Regression
Section 5: More on MLR, Dummy Variables, Interactions
3
Let’s start with a question...
My entire portfolio is in U.S. equities. How would you describe the

potential outcomes for my returns in 2017?
4
Introduction
Probability and statistics let us talk efficiently about things we are

unsure about.
I How likely is Trump to finish a four year term?
I How much will Amazon sell next quarter?
I What will the return of my retirement portfolio be next year?
I How often will users click on a particular Facebook ad?
All of these involve inferring or predicting unknown quantities!!
5
Random Variables
I Random Variables are numbers that we are NOT sure about

but we might have some idea of how to describe its potential
outcomes.
I Example: Suppose we are about to toss two coins.
Let X denote the number of heads.
We say that X , is the random variable that stands for the

number we are not sure about.
6
Probability
Probability is a language designed to help us talk and think about
aggregate properties of random variables. The key idea is that to
each event we will assign a number between 0 and 1 which reflects
how likely that event is to occur. For such an immensely useful
language, it has only a few basic rules.
1. If an event A is certain to occur, it has probability 1, denoted

P(A) = 1.
2. P(not-A) = 1 − P(A).
3. If two events A and B are mutually exclusive (both cannot
occur simultaneously), then P(A or B) = P(A) + P(B).
4. P(A and B) = P(A)P(B|A) = P(B)P(A|B)
7
Probability Distribution
I We describe the behavior of random variables with a
Probability Distribution
I Example: If X is the random variable denoting the number of
heads in two independent coin tosses, we can describe its
behavior through the following probability distribution:

 0 with prob. 0.25


X = 1 with prob. 0.5


 2 with prob. 0.25
I X is called a Discrete Random Variable as we are able to list

all the possible outcomes
I Question: What is Pr (X = 0)? How about Pr (X ≥ 1)?
8
Conditional, Joint and Marginal Distributions
In general we want to use probability to address problems involving

more than one variable at the time
Think back to our first question on the returns of my portfolio... if

we know that the economy will be growing next year, does that
change the assessment about the behavior of my returns?
We need to be able to describe what we think will happen to one

variable relative to another...
9
Here’s an example: we want to answer questions like: How are my

sales impacted by the overall economy?
Let E denote the performance of the economy next quarter... for

simplicity, say E = 1 if the economy is expanding and E = 0 if the
economy is contracting (what kind of random variable is this?)
Let’s assume pr (E = 1) = 0.7
10
Let S denote my sales next quarter... and let’s suppose the

following probability statements:
S pr (S|E = 1) S pr (S|E = 0)
1 0.05 1 0.20
2 0.20 2 0.30
3 0.50 3 0.30
4 0.25 4 0.20
These are called Conditional Distributions
11
S pr (S|E = 1) S pr (S|E = 0)
1 0.05 1 0.20
2 0.20 2 0.30
3 0.50 3 0.30
4 0.25 4 0.20
I In blue is the conditional distribution of S given E = 1

I In red is the conditional distribution of S given E = 0
I We read: the probability of Sales of 4 (S = 4) given(or
conditional on) the economy is growing (E = 1) is 0.25
12
The conditional distributions tell us about about what can happen

to S for a given value of E ... but what about S and E jointly?
pr (S = 4 and E = 1) = pr (E = 1) × pr (S = 4|E = 1)
= 0.70 × 0.25 = 0.175
In english, 70% of the times the economy grows and 1/4 of those
times sales equals 4... 25% of 70% is 17.5%
13
95.5"'1")"&'17*)8 /23
AIX 5"#$%&'()&*$+,&$&/0-/23&$&/+03&
$B"(95"69$*5
/3 AI
7.$<511@ 5"#$1&'()&*$+,&$&/0-/3&$&/13
>IJ /2
/0 P37Q AIW 5"#$2&'()&*$+,&$&/0-/2&$&/+%&
/43
5"#$+&'()&*$+,&$&/0-/43&$&/413&
AIJ
/1
/2 AIX 5"#$%&'()&*$4,&$&/1-/2&$&/46
>IK /1 AI 5"#$1&'()&*$4,&$&/1-/1 $&/47

P&$6%Q /1
N95.5"
AIW 5"#$2&'()&*$4,&$&/1-/1 $&/47
).5"^"7$11'2*5 /2
$3(<$=51"B$."PA+>Q 5"#$+&'()&*$4,&$&/1-/2&$&/46
AIJ
49)("'1"(95"'()*"#(% ]PAIXQ"M !JRV_!K` 14
45"<)%"&'17*)8"(95"a$'%("&'1(.'23('$%"$B"A")%&">
)"(6$"6)8"()2*5!
Why we call marginals marginals... the table represents the joint
and at N95"5%(.8"'%"(95"1"<$*3=%")%&"5".$6"/';51"]PAI
the margins, we get the marginals.
A
J"""""""W""""""" """""""X
T$6"
K"""!K`""""!Kb""""!Kb !K` ! 155"6
> =)./
J"""!K V""!JX""""! V""""!JRV !R ).5"<)
(95"=
!KbV"""!W """!XX""""!W V J
15
)=7*5
Example... Given E = 1 what is the probability of S = 4?
A
5%">"37 J"""""""W""""""" """""""X
("'1"(95 !
K"""!K`""""!Kb""""!Kb !K`
2 A)*51IXM
>
J"""!K V""!JX""""! V""""!JRV !R
="(95"a$'%(
'23('$%Q
!KbV"""!W """!XX""""!W V J
pr (S = 7PA
4, E = 1)+> 0.175
!Q !!"#
pr (S = 4|E = 1) =
7PA U > !prQ (E = 1) = 0.7 = 0.25 !$#
7P> !Q !" 16
Example... Given S = 4 what is the probability of E = 1?
A
*51IX+
95 J"""""""W""""""" """""""X
(8 !
"37M K"""!K`""""!Kb""""!Kb !K`
>
J"""!K V""!JX""""! V""""!JRV !R
!KbV"""!W """!XX""""!W V J
pr (S = 4, E = 1) 0.175
pr (E = 1|S = 4) = = = 0.745
pr (S = 4) 0.235
7PA +> !Q !!"#
7P> ! U A Q !" # 17
Independence
Two random variable X and Y are independent if
pr (Y = y |X = x) = pr (Y = y )
for all possible x and y .
In other words,
knowing X tells you nothing about Y !
e.g.,tossing a coin 2 times... what is the probability of getting H in

the second toss given we saw a T in the first one?
18
Trump’s victory
Let’s try to figure out why were people so confused on November

8th 2016...
I am simplifying things a bit, but starting the day, Trump had to

win 5 states to get the presidency: Florida, North Carolina,
Pennsylvania, Michigan and Wisconsin. One could also say that
each of these states had a 50-50 change for Trump and Hillary.
So, based on this information, what was the probability of a Trump

victory? (Homework: make sure to revisit this at home. )
19
>?)=7*5
Disease Testing Example
0'15)15"(51('%/!
H5("0"IJ"'%&'<)(5"8$3"9);5"(95"&'15)15!
Let D =H5("NIJ"'%&'<)(5"(9)("8$3"(51("7$1'(';5"B$."'(!
1 indicate you have a disease
Let T = 1 indicate that you test positive for it
-$1("&$<($.1"(9'%:"'%"(5.=1"$B"7P&Q")%&"7P(U&Q!
/73 NIJ 0
0IJ
/42 /43 K""""""""""""""""""""""""J
NIK " K"""!bRKWI!b^Z!bb"""""!KKJ
/4+ NIJ
N
/7G 0IK J"""!KKb^""""""""""""""""""!KJb
/77
NIK
If you take the test and the result is positive, you are really
interested in the question: Given that you tested positive, what is
the chance you have the disease?
20
+,"-.#/&!0/$.1$/2!1"$"-.3/45($61/$5./75(#7./&!
Disease5(-./$5./8"1.(1.9+
Testing Example
0
K""""""""""""""""""""""""J
K"""!bRKW""""""""""""""""""!KKJ
N
J"""!KKb^""""""""""""""""""!KJb
]P0IJUNIJQ"I"!KJb¥P!KJb_!KKb^Q"I"K!``
0.019
pr (D = 1|T = 1) = = 0.66
(0.019 + 0.0098)
21
Disease Testing Example
I Try to think about this intuitively... imagine you are about to

test 100,000 people.
I we assume that about 2,000 of those have the disease.
I we also expect 1% of the disease-free people to test positive,
ie, 980, and 95% of the sick people to test positive, ie 1,900.
So, we expect a total of 2,880 positive tests.
I Choose one of the 2,880 people at random... what is the
probability that he/she has the disease?
p(D = 1|T = 1) = 1, 900/2, 880 = 0.66
I isn’t that the same?!

22
Probability and Decisions
Suppose you are presented with an investment opportunity in the
development of a drug... probabilities are a vehicle to help us build
scenarios and make decisions.
23
We basically have a new random variable, i.e, our revenue, with

the following probabilities...
Revenue P(Revenue)
$250,000 0.7
$0 0.138
$25,000,000 0.162
The expected revenue is then $4,225,000...

So, should we invest or not?
24
Let’s get back to the drug investment example...
What if you could choose this investment instead?
Revenue P(Revenue)
$3,721,428 0.7
$0 0.138
$10,000,000 0.162
The expected revenue is still $4,225,000...

What is the difference?
25
Mean and Variance of a Random Variable
The Mean or Expected Value is defined as (for a discrete X ):
n
X
E (X ) = Pr (xi ) × xi
i=1
We weight each possible value by how likely they are... this

provides us with a measure of centrality of the distribution... a
“good” prediction for X !
26
The Variance is defined as (for a discrete X ):
n
X
Var (X ) = Pr (xi ) × [xi − E (X )]2
i=1
Weighted average of squared prediction errors... This is a measure

of spread of a distribution. More risky distributions have larger
variance.
27
The Standard Deviation
I What are the units of E (X )? What are the units of Var (X )?

I A more intuitive way to understand the spread of a
distribution is to look at the standard deviation:
p
sd(X ) = Var (X )
I What are the units of sd(X )?
28
Continuous Random Variables
I Suppose we are trying to predict tomorrow’s return on the

S&P500...
I Question: What is the random variable of interest?
I Question: How can we describe our uncertainty about
tomorrow’s outcome?
I Listing all possible values seems like a crazy task... we’ll work
with intervals instead.
I These are call continuous random variables.
I The probability of an interval is defined by the area under the
probability density function.
29
The Normal Distribution
I A random variable is a number we are NOT sure about but
we might have some idea of how to describe its potential
outcomes. The Normal distribution is the most used
probability distribution to describe a random variable
I The probability the number ends up in an interval is given by
the area under the curve (pdf) 0.4
0.3
standard normal pdf
0.2
0.1
0.0
−4 −2 0 2 4
30
I The standard Normal distribution has mean 0 and has
variance 1.
I Notation: If Z ∼ N(0, 1) (Z is the random variable)
Pr (−1 < Z < 1) = 0.68
Pr (−1.96 < Z < 1.96) = 0.95

0.4
0.4
standard normal pdf
standard normal pdf

0.3
0.3
0.2
0.2
0.1
0.1
0.0
0.0
−4 −2 0 2 4 −4 −2 0 2 4
31
z z
Note:
For simplicity we will often use P(−2 < Z < 2) ≈ 0.95
Questions:
I What is Pr (Z < 2) ? How about Pr (Z ≤ 2)?

I What is Pr (Z < 0)?
32
I The standard normal is not that useful by itself. When we say

“the normal distribution”, we really mean a family of
distributions.
I We obtain pdfs in the normal family by shifting the bell curve
around and spreading it out (or tightening it up).
33
I We write X ∼ N(µ, σ 2 ). “Normal distribution with mean µ
and variance σ 2 .
I The parameter µ determines where the curve is. The center of
the curve is µ.
I The parameter σ determines how spread out the curve is. The
area under the curve in the interval (µ − 2σ, µ + 2σ) is 95%.
Pr (µ − 2 σ < X < µ + 2 σ) ≈ 0.95
σ µ−σ
µ − 2σ µ µ + σ µ + 2σ
σ 34
I For the normal family of distributions we can see that the

parameter µ talks about “where” the distribution is located or
centered.
I We often use µ as our best guess for a prediction.
I The parameter σ talks about how spread out the distribution
is. This gives us and indication about how uncertain or how
risky our prediction is.
I If X is any random variable, the mean will be a measure of
the location of the distribution and the variance will be a
measure of how spread out it is.
35
I Example: Below are the pdfs of X1 ∼ N(0, 1), X2 ∼ N(3, 1),
and X3 ∼ N(0, 16).
I Which pdf goes with which X ?
−8 −6 −4 −2 0 2 4 6 8 36
The Normal Distribution – Example
I Assume the annual returns on the SP500 are normally

distributed with mean 6% and standard deviation 15%.
SP500 ∼ N(6, 225). (Notice: 152 = 225).
I Two questions: (i) What is the chance of losing money on a
given year? (ii) What is the value that there’s only a 2%
chance of losing that or more?
I Lloyd Blankfein: “I spend 98% of my time thinking about 2%
probability events!”
I (i) Pr (SP500 < 0) and (ii) Pr (SP500 <?) = 0.02
37
The Normal Distribution – Example
prob less than 0 prob is 2%

0.020
0.020
0.010
0.010
0.000
0.000
−40 −20 0 20 40 60 −40 −20 0 20 40 60
sp500 sp500
I (i) Pr (SP500 < 0) = 0.35 and (ii) Pr (SP500 < −25) = 0.02
I In Excel: NORMDIST and NORMINV (homework!)
38
1. Note: In
X ∼ N(µ, σ 2 )
µ is the mean and σ 2 is the variance.
2. Standardization: if X ∼ N(µ, σ 2 ) then
X −µ
Z= ∼ N(0, 1)
σ
3. Summary:
X ∼ N(µ, σ 2 ):
µ: where the curve is
σ: how spread out the curve is
95% chance X ∈ µ ± 2σ.
39
The Normal Distribution – Another Example
Prior to the 1987 crash, monthly S&P500 returns (r ) followed
(approximately) a normal with mean 0.012 and standard deviation
equal to 0.043. How extreme was the crash of -0.2176? The
standardization helps us interpret these numbers...
r ∼ N(0.012, 0.0432 )
r − 0.012
z= ∼ N(0, 1)
0.043
For the crash,
−0.2176 − 0.012
z= = −5.27
0.043
How extreme is this zvalue? 5 standard deviations away!! 40

Regression to the Mean
I Imagine your performance on a task follows a standard normal

distribution, i.e., N(0, 1)... Say you perform that task today
and score 2.
I If you perform the same task tomorrow, what is the
probability you are going to do worse? 97.5%, right?
I This is called regression to the mean!!
Make sure to read the article on this topic available in the class
website...
41

Section 1: Introduction and Probability Concepts

Uploaded by

Copyright:

Available Formats

Section 1: Introduction and Probability Concepts

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Section 1: Introduction and Probability Concepts

Uploaded by

Copyright:

Available Formats

Section 1: Introduction and Probability Concepts

Section 1: Introduction and Probability Concepts

Section 2: Learning from Data: Estimation, Confidence Intervals and

Section 3: Simple Linear Regression

Section 4: Multiple Linear Regression

Section 5: More on MLR, Dummy Variables, Interactions

My entire portfolio is in U.S. equities. How would you describe the

Probability and statistics let us talk efficiently about things we are

All of these involve inferring or predicting unknown quantities!!

I Random Variables are numbers that we are NOT sure about

We say that X , is the random variable that stands for the

1. If an event A is certain to occur, it has probability 1, denoted

I X is called a Discrete Random Variable as we are able to list

In general we want to use probability to address problems involving

Think back to our first question on the returns of my portfolio... if

We need to be able to describe what we think will happen to one

Here’s an example: we want to answer questions like: How are my

Let E denote the performance of the economy next quarter... for

Let S denote my sales next quarter... and let’s suppose the

These are called Conditional Distributions

I In blue is the conditional distribution of S given E = 1

The conditional distributions tell us about about what can happen

>IK /1 AI 5"#$1&'()&*$4,&$&/1-/1 $&/47

Two random variable X and Y are independent if

for all possible x and y .

knowing X tells you nothing about Y !

e.g.,tossing a coin 2 times... what is the probability of getting H in

Let’s try to figure out why were people so confused on November

I am simplifying things a bit, but starting the day, Trump had to

So, based on this information, what was the probability of a Trump

I Try to think about this intuitively... imagine you are about to

p(D = 1|T = 1) = 1, 900/2, 880 = 0.66

I isn’t that the same?!

We basically have a new random variable, i.e, our revenue, with

The expected revenue is then $4,225,000...

Let’s get back to the drug investment example...

What if you could choose this investment instead?

The expected revenue is still $4,225,000...

The Mean or Expected Value is defined as (for a discrete X ):

We weight each possible value by how likely they are... this

The Variance is defined as (for a discrete X ):

Weighted average of squared prediction errors... This is a measure

I What are the units of E (X )? What are the units of Var (X )?

I What are the units of sd(X )?

I Suppose we are trying to predict tomorrow’s return on the

Pr (−1 < Z < 1) = 0.68

Pr (−1.96 < Z < 1.96) = 0.95

standard normal pdf

For simplicity we will often use P(−2 < Z < 2) ≈ 0.95

I What is Pr (Z < 2) ? How about Pr (Z ≤ 2)?

I The standard normal is not that useful by itself. When we say

I For the normal family of distributions we can see that the

I Assume the annual returns on the SP500 are normally

I (i) Pr (SP500 < 0) and (ii) Pr (SP500 <?) = 0.02

prob less than 0 prob is 2%

µ is the mean and σ 2 is the variance.

2. Standardization: if X ∼ N(µ, σ 2 ) then

How extreme is this zvalue? 5 standard deviations away!! 40

I Imagine your performance on a task follows a standard normal