Statistical View of Regression A MATLAB Tutorial
Statistical View of Regression A MATLAB Tutorial
1 / 54
Introduction to Regression
Goal: Express the relationship between two (or more) variables by a mathematical formula.
x is the predictor (independent) variable y is the response (dependent) variable
2 / 54
Footwear impressions are commonly observed at crime scenes. While there are numerous forensic properties that can be obtained from these impressions, one in particular is the shoe size. The detectives would like to be able to estimate the height of the impression maker from the shoe size.
3 / 54
Determining Height from Shoe Size 76 74 72 Height (in) 70 68 66 64 62 60 6 7 8 9 10 11 12 Shoe Size (Mens) 13 14 15
4 / 54
5 / 54
76 74
2
What is the predictor? What is the response? Can the height of the impression maker be accurately estimated from the shoe size?
6 / 54
76 74
2
What is the predictor? What is the response? Can the height of the impression maker be accurately estimated from the shoe size? If a shoe is size 11, what would you advise the police?
72 Height (in) 70 68 66
3
7 / 54
76 74
2
What is the predictor? What is the response? Can the height of the impression maker be accurately estimated from the shoe size? If a shoe is size 11, what would you advise the police? What if the size is 7? Size 12.5?
72 Height (in) 70 68 66
3
8 / 54
9 / 54
A = 1; = /2; = ; = 0.5
0.5
y(x)
0.5
1.5
10
x
10 / 54
Regression Modeling
We want to estimate m(x) and possibly the distribution of (x) There are two general situations: Theoretical Models
m(x) is of some known (or hypothesized) form but with some parameters unknown. (e.g. Sinusoid Function with A, , unknown)
Empirical Models
m(x) is constructed from the observed data (e.g. Shoe size and height)
We often end up using both: constructing models from the observed data and prior knowledge.
11 / 54
x x =0 x = x
These assumptions are only on the error term. (x) = y(x) m(x)
12 / 54
Residuals
The residuals e(xi ) = y(xi ) m(xi ) can be used to check the estimated model, m(x). If the model t is good, the residuals should satisfy our three assumptions.
13 / 54
A1 - Mean 0
Violates A1
10 8 6 4 2 2 3
Satises A1
e(x)
e(x)
0.2
0.4
0.6
0.8
14 / 54
A2 - Constant Variance
Violates A2
30 3 20 2
Satises A2
10
e(x)
e(x)
0 0.2 0.4 0.6 0.8 1
10
20
30
0.2
0.4
0.6
0.8
15 / 54
A3 - Uncorrelated
Violates A3
1 0.8 0.6 0.4 0.2 2 3
Satises A3
e(x)
e(x)
0.2
0.4
0.6
0.8
16 / 54
Height (in)
70 68 66 64 62 60 6 7 8 9 10 11 12 13 14 15
18 / 54
m(x) = 0 + 1 x Line #1 Line #2 Line #3 0 48.6 51.5 45.0 1 1.9 1.6 2.3
Height (in)
70 68 66 64 62 60 6 7 8 9 10 11 12 13 14 15
19 / 54
Write the observed data: yi = 0 + 1 xi + i where yi y(xi ) is the response value for observation i 0 and 1 are the unknown parameters (regression coefcients) xi is the predictor value for observation i i (xi ) is the random error for observation i i = 1, 2, . . . , n
20 / 54
Let g(x) g(x; ) be an estimator for y(x) Dene a Loss Function, L(y(x), g(x)) which describes how far g(x) is from y(x) Example Squared Error Loss L(y(x), g(x)) = (y(x) g(x))2 The best predictor minimizes the Risk (or expected Loss) R(x) = E[L(y(x), g(x))] g (x) = arg min E[L(y(x), g(x))]
gG
21 / 54
If we assume a squared error loss function L(yi , mi ) = (yi (0 + 1 xi ))2 An approximation to the Risk function is the Sum of Squared Errors (SSE ):
n
R(0 , 1 ) =
i=1
(yi (0 + 1 xi ))2
Then it makes sense to estimate (0 , 1 ) as the values that minimize R(0 , 1 ) (0 , 1 ) = arg min R(0 , 1 )
B0 ,B1
22 / 54
R(0 , 1 ) =
i=1
(yi (0 + 1 xi ))2
Differentiate the Risk function with respect to the unknown parameters and equate to 0 R 0 R 1
n
= 2
=0 i=1 n
(yi (0 + 1 xi )) = 0 xi (yi (0 + 1 xi )) = 0
i=1
= 2
=0
23 / 54
R(0 , 1 ) =
i=1
(yi (0 + 1 xi ))2
24 / 54
Line # 2!
Determining Height from Shoe Size
76 74 72 Line #1 Line #2 Line #3
Height (in)
70 68 66 64 62 60 6 7 8 9 10 11 12 13 14 15
25 / 54
Residuals
The tted value, yi for the ith observation is yi = 0 + 1 xi The residual, ei is the difference between the observed and tted value ei = yi yi The residuals are used to check if our three assumptions appear valid
26 / 54
residual
1 0 1 2 3 4 5
10
11
12
13
14
15
Example of poor t
Scatter Plot
9
4
Residual Plot
3
7
2
6
1
y(x)
e(x)
4 3 2 1 0 1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
4 1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
28 / 54
=
k=0
k xk
Danger of over-tting
29 / 54
y(x)
y(x)
4 3 2 1 0 1
1 1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
x
4
3
3
2
2
1
1
e(x)
e(x)
4 1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
4 1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
30 / 54
p k k=0 k xi
+ i . In matrix
Y=
y1 y2 . . . yn
, X =
p 2 1 x1 x1 . . . x1 p 2 1 x2 x2 . . . x2 . . . .. . . . . . . . . . . p 2 1 xn xn . . . xn
, =
0 1 . . . p
, =
1 2 . . . n
31 / 54
To minimize SSE (Sum of Squared Errors), use Risk function R() = (Y X)T (Y X) Taking derivative w.r.t gives the Normal Equations X T X = X T Y The least squares solution for is ...
Hint: See Linear Inverse Problems: A MATLAB Tutorial by Qin Zhang
32 / 54
To minimize SSE (Sum of Squared Errors), use Risk function R() = (Y X)T (Y X) Taking derivative w.r.t gives the Normal Equations X T X = X T Y The least squares solution for is ...
Hint: See Linear Inverse Problems: A MATLAB Tutorial by Qin Zhang
= (X T X)1 X T Y
33 / 54
STRETCH BREAK!!!
34 / 54
MATLAB Demonstration
Linear Least Squares
35 / 54
Model Selection
How can we compare and select a nal model? How many terms should be include in polynomial models? What is the danger of over-tting? (Including too many terms) What is the problem with under-tting? (Not including enough terms)
36 / 54
Estimating Variance
Recall assumptions A1, A2, and A3:
Assumptions
For our tted model, the residuals ei = yi yi can be used to estimate Var[(x)]. An estimator for the variance is ...
Hint: See Basic Statistical Concepts and Some Probability Essentials by Justin Shows and Betsy Enstrom
37 / 54
Estimating Variance
Recall assumptions A1, A2, and A3:
Assumptions
For our tted model, the residuals ei = yi yi can be used to estimate Var[(x)]. An estimator for the variance is ...
Hint: See Basic Statistical Concepts and Some Probability Essentials by Justin Shows and Betsy Enstrom
(zi )2 z
i=1
38 / 54
Estimating Variance
Sample Variance for a rv z s2 = z 1 n1
n
(zi )2 z
i=1
e2 i
i=1
where the degrees of freedom df = n (p + 1). There are p + 1 unknown parameters in the model.
39 / 54
Statistical Inference
An additional assumption
40 / 54
Statistical Inference
Distributions
Using
T y(xo ) = x0 + (xo ) T y(xo ) N (x0 , 2 )
MVN X, 2 (X T X)1 From these we can nd CIs and perform hypothesis tests.
41 / 54
Model Comparison 2
R
SSE =
i=1
(yi yi ) =
i=1
e2 = e e i
SST =
i=1
(yi )2 y
R2 is a measure of how much better a regression model is than the intercept only.
42 / 54
Model 2Comparison
Adjusted R
What happens to R2 if you add more terms in the model? R2 = 1 SSE SST
43 / 54
Model 2Comparison
Adjusted R
What happens to R2 if you add more terms in the model? R2 = 1 SSE SST
Adjusted R2 penalizes by the number of terms (p + 1) in the model R2 = 1 adj SSE /(n (p + 1)) SST /(n 1) =1 SST /(n 1)
MATLAB Demonstration
cftool
45 / 54
Nonlinear Regression
A linear regression model can be written
p
y(x) =
k=0
k hk (x) + (x)
Nonlinear regression takes the general form y(x) = m(x; ) + (x) for some specied function m(x; ) with unknown parameters .
46 / 54
Nonlinear Regression
A linear regression model can be written
p
y(x) =
k=0
k hk (x) + (x)
Nonlinear regression takes the general form y(x) = m(x; ) + (x) for some specied function m(x; ) with unknown parameters . Example The sinusoid we looked at earlier y(x) = A sin(x + ) + (x) with parameters = (A, , ) is a nonlinear model.
47 / 54
Nonlinear Regression
Parameter Estimation
Making same assumptions as in linear regression (A1-A3), the least squares solution is still valid
n
= arg min
i=1
Unfortunately, this usually doesnt have a closed form solution (like in the linear case) Approaches to nding the solution will be discussed later in the workshop But that wont stop us from using nonlinear (and nonparametric) regression in MATLAB!
48 / 54
49 / 54
Weighted Regression
Consider the risk functions we have considered so far
n
R() =
i=1
Each observation is equally contributes to the risk Weighted regression uses the risk function
n
Rw () =
i=1
so observations with larger weights are more important. Some examples wi = 1/i2 wi = 1/xi wi = 1/yi wi = k/|ei | Robust Regression
50 / 54
Transformations
Sometimes transformations are used to obtain better models Transform predictors x x Transform response y y
Make sure assumptions A1-A3,A4 are still valid
Standardized x = Log
x x sx
y = log(y)
51 / 54
The Competition
Contest to see who can construct the best model in cftool Get into groups Data can be found in competition data.m Scoring will be performed on testing set Want to minimize sum of squared errors When group is ready, enter model into this computer
52 / 54
MATLAB Help
There is lots of good assistance in the MATLAB help window Specically, look at the Demos tab on the help window The Toolboxes of Statistics (Regression) and Optimization may be particularly useful for this workshop
53 / 54
54 / 54