Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
242 views

Statistical View of Regression A MATLAB Tutorial

The document provides an introduction to regression analysis and the least squares method for fitting linear regression models in MATLAB. It begins with examples of using shoe size data to predict a person's height. It explains simple linear regression assumes the relationship can be modeled by a linear function. The least squares method estimates the parameters of the linear model by minimizing the sum of squared errors between the observed and predicted values. The document also discusses checking model assumptions by examining the residuals and improving the fit by adding polynomial terms to model nonlinear relationships. Finally, it introduces the matrix formulation for solving the normal equations to obtain the least squares estimates.

Uploaded by

Neha Kulkarni
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
242 views

Statistical View of Regression A MATLAB Tutorial

The document provides an introduction to regression analysis and the least squares method for fitting linear regression models in MATLAB. It begins with examples of using shoe size data to predict a person's height. It explains simple linear regression assumes the relationship can be modeled by a linear function. The least squares method estimates the parameters of the linear model by minimizing the sum of squared errors between the observed and predicted values. The document also discusses checking model assumptions by examining the residuals and improving the fit by adding polynomial terms to model nonlinear relationships. Finally, it introduces the matrix formulation for solving the normal equations to obtain the least squares estimates.

Uploaded by

Neha Kulkarni
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Regression and Least Squares: A MATLAB Tutorial

Dr. Michael D. Porter


porter@stat.ncsu.edu
Department of Statistics North Carolina State University and SAMSI

Tuesday May 20, 2008

1 / 54

Introduction to Regression
Goal: Express the relationship between two (or more) variables by a mathematical formula.
x is the predictor (independent) variable y is the response (dependent) variable

We specically want to indicate how y varies as a function of x.


y(x) is considered a random variable, so it can never be predicted perfectly.

2 / 54

Example: Relating Shoe Size to Height


The problem

Footwear impressions are commonly observed at crime scenes. While there are numerous forensic properties that can be obtained from these impressions, one in particular is the shoe size. The detectives would like to be able to estimate the height of the impression maker from the shoe size.

3 / 54

Example: Relating Shoe Size to Height


The data

Determining Height from Shoe Size 76 74 72 Height (in) 70 68 66 64 62 60 6 7 8 9 10 11 12 Shoe Size (Mens) 13 14 15

Data taken from: http://staff.imsa.edu/brazzle/E2Kcurr/Forensic/Tracks/TracksSummary.html

4 / 54

Example: Relating Shoe Size to Height


Your answers
Determining Height from Shoe Size
1

76 74 72 Height (in) 70 68 66 64 62 60 6 7 8 9 10 11 12 Shoe Size (Mens) 13 14 15

What is the predictor? What is the response?

5 / 54

Example: Relating Shoe Size to Height


Your answers
Determining Height from Shoe Size
1

76 74
2

What is the predictor? What is the response? Can the height of the impression maker be accurately estimated from the shoe size?

72 Height (in) 70 68 66 64 62 60 6 7 8 9 10 11 12 Shoe Size (Mens) 13 14 15

6 / 54

Example: Relating Shoe Size to Height


Your answers
Determining Height from Shoe Size
1

76 74
2

What is the predictor? What is the response? Can the height of the impression maker be accurately estimated from the shoe size? If a shoe is size 11, what would you advise the police?

72 Height (in) 70 68 66
3

64 62 60 6 7 8 9 10 11 12 Shoe Size (Mens) 13 14 15

7 / 54

Example: Relating Shoe Size to Height


Your answers
Determining Height from Shoe Size
1

76 74
2

What is the predictor? What is the response? Can the height of the impression maker be accurately estimated from the shoe size? If a shoe is size 11, what would you advise the police? What if the size is 7? Size 12.5?

72 Height (in) 70 68 66
3

64 62 60 6 7 8 9 10 11 12 Shoe Size (Mens) 13 14 15


4

8 / 54

General Regression Model


Assume the true model is of the form: y(x) = m(x) + (x) The systematic part, m(x) is deterministic The error, (x) is a random variable
Measurement error Natural variations due to exogenous factors Therefore, y(x) is also a random variable

The error is additive

9 / 54

Example: Sinusoid Function



2 y(x) m(x) 1.5

y(x) = A sin(x + ) + (x)

A = 1; = /2; = ; = 0.5

Amplitude A Angular frequency Phase Random error (x) N(0, 2 )

0.5

y(x)

0.5

1.5

10

x
10 / 54

Regression Modeling
We want to estimate m(x) and possibly the distribution of (x) There are two general situations: Theoretical Models
m(x) is of some known (or hypothesized) form but with some parameters unknown. (e.g. Sinusoid Function with A, , unknown)

Empirical Models
m(x) is constructed from the observed data (e.g. Shoe size and height)

We often end up using both: constructing models from the observed data and prior knowledge.

11 / 54

The Standard Assumptions


y(x) = m(x) + (x)

A1: E[(x)] = 0 A2: Var[(x)] = A3: 2 Cov[(x), (x )]

x x =0 x = x

(Mean 0) (Homoskedastic) (Uncorrelated)

These assumptions are only on the error term. (x) = y(x) m(x)

12 / 54

Residuals
The residuals e(xi ) = y(xi ) m(xi ) can be used to check the estimated model, m(x). If the model t is good, the residuals should satisfy our three assumptions.

13 / 54

A1 - Mean 0
Violates A1
10 8 6 4 2 2 3

Satises A1

e(x)

0 2 4 6 8 10 0 0.2 0.4 0.6 0.8 1

e(x)

0.2

0.4

0.6

0.8

14 / 54

A2 - Constant Variance
Violates A2
30 3 20 2

Satises A2

10

e(x)

e(x)
0 0.2 0.4 0.6 0.8 1

10

20

30

0.2

0.4

0.6

0.8

15 / 54

A3 - Uncorrelated
Violates A3
1 0.8 0.6 0.4 0.2 2 3

Satises A3

e(x)

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

e(x)

0.2

0.4

0.6

0.8

16 / 54

Back to the Shoes


How can we estimate m(x) for the shoe example? (Non-parametric): For each shoe size, take the mean of the observed heights. (Parametric): Assume the trend is linear.
Determining Height from Shoe Size
76 74 72 Local Mean Linear Trend

Height (in)

70 68 66 64 62 60 6 7 8 9 10 11 12 13 14 15

Shoe Size (Mens)


17 / 54

Simple Linear Regression


Simple linear regression assumes that m(x) is of the parametric form m(x) = 0 + 1 x which is the equation for a line.

18 / 54

Simple Linear Regression


Which line is the best estimate?
Determining Height from Shoe Size
76 74 72 Line #1 Line #2 Line #3

m(x) = 0 + 1 x Line #1 Line #2 Line #3 0 48.6 51.5 45.0 1 1.9 1.6 2.3

Height (in)

70 68 66 64 62 60 6 7 8 9 10 11 12 13 14 15

Shoe Size (Mens)

19 / 54

Estimating Parameters in Linear Regression


Data

Write the observed data: yi = 0 + 1 xi + i where yi y(xi ) is the response value for observation i 0 and 1 are the unknown parameters (regression coefcients) xi is the predictor value for observation i i (xi ) is the random error for observation i i = 1, 2, . . . , n

20 / 54

Estimating Parameters in Linear Regression


Statistical Decision Theory

Let g(x) g(x; ) be an estimator for y(x) Dene a Loss Function, L(y(x), g(x)) which describes how far g(x) is from y(x) Example Squared Error Loss L(y(x), g(x)) = (y(x) g(x))2 The best predictor minimizes the Risk (or expected Loss) R(x) = E[L(y(x), g(x))] g (x) = arg min E[L(y(x), g(x))]
gG

21 / 54

Estimating Parameters in Linear Regression


Method of Least Squares

If we assume a squared error loss function L(yi , mi ) = (yi (0 + 1 xi ))2 An approximation to the Risk function is the Sum of Squared Errors (SSE ):
n

R(0 , 1 ) =
i=1

(yi (0 + 1 xi ))2

Then it makes sense to estimate (0 , 1 ) as the values that minimize R(0 , 1 ) (0 , 1 ) = arg min R(0 , 1 )
B0 ,B1

22 / 54

Estimating Parameters in Linear Regression


Derivation of Linear Least Squares Solution
n

R(0 , 1 ) =
i=1

(yi (0 + 1 xi ))2

Differentiate the Risk function with respect to the unknown parameters and equate to 0 R 0 R 1
n

= 2
=0 i=1 n

(yi (0 + 1 xi )) = 0 xi (yi (0 + 1 xi )) = 0
i=1

= 2
=0

23 / 54

Estimating Parameters in Linear Regression


Linear Least Squares Solution
n

R(0 , 1 ) =
i=1

(yi (0 + 1 xi ))2

The least square estimates are 1 =


n xy i=1 xi yi n n 2 n2 x i=1 xi

0 = 1 y x where and are the sample means of the xi s and yi s. x y

24 / 54

And the winner is ...

Line # 2!
Determining Height from Shoe Size
76 74 72 Line #1 Line #2 Line #3

For these data: = 11.03 = 69.31 x y 0 = 51.46 1 = 1.62

Height (in)

70 68 66 64 62 60 6 7 8 9 10 11 12 13 14 15

Shoe Size (Mens)

25 / 54

Residuals
The tted value, yi for the ith observation is yi = 0 + 1 xi The residual, ei is the difference between the observed and tted value ei = yi yi The residuals are used to check if our three assumptions appear valid

26 / 54

Residuals for shoe size data


Determining Height from Shoe Size
5 Residuals 4 3 2

residual

1 0 1 2 3 4 5

10

11

12

13

14

15

Shoe Size (Mens)


27 / 54

Example of poor t
Scatter Plot
9
4

Residual Plot
3

7
2

6
1

y(x)

e(x)
4 3 2 1 0 1

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

4 1

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

28 / 54

Adding Polynomial Terms in the Linear Model


Modeling the mean trend as a line doesnt seem to t extremely well in the above example. There is a systematic lack of t. Consider a polynomial form for the mean m(x) = 0 + 1 x + 2 x2 + . . . + p xp
p

=
k=0

k xk

This is still considered a linear model


m(x) is a linear combination of the k

Danger of over-tting

29 / 54

Quadratic Fit: y(x) = 0 + 1 x + 2 x2 + (x)


Scatter Plot
9 1st Order Quadratic 8

y(x)

y(x)
4 3 2 1 0 1

1 1

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

x Residual Plot (Quadratic Fit)


4

x
4

3
3

2
2

1
1

e(x)

e(x)

4 1

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

4 1

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

30 / 54

Matrix Approach to Linear Least Squares


Setup

Previously, we wrote our data as yi = notation this becomes Y = X +

p k k=0 k xi

+ i . In matrix

Y=

y1 y2 . . . yn

, X =

p 2 1 x1 x1 . . . x1 p 2 1 x2 x2 . . . x2 . . . .. . . . . . . . . . . p 2 1 xn xn . . . xn

, =

0 1 . . . p

, =

1 2 . . . n

How many unknown parameters are in the model?

31 / 54

Matrix Approach to Linear Least Squares


Solution

To minimize SSE (Sum of Squared Errors), use Risk function R() = (Y X)T (Y X) Taking derivative w.r.t gives the Normal Equations X T X = X T Y The least squares solution for is ...
Hint: See Linear Inverse Problems: A MATLAB Tutorial by Qin Zhang

32 / 54

Matrix Approach to Linear Least Squares


Solution

To minimize SSE (Sum of Squared Errors), use Risk function R() = (Y X)T (Y X) Taking derivative w.r.t gives the Normal Equations X T X = X T Y The least squares solution for is ...
Hint: See Linear Inverse Problems: A MATLAB Tutorial by Qin Zhang

= (X T X)1 X T Y

33 / 54

STRETCH BREAK!!!

34 / 54

MATLAB Demonstration
Linear Least Squares

MATLAB Demo #1 Open Regression_Intro.m

35 / 54

Model Selection
How can we compare and select a nal model? How many terms should be include in polynomial models? What is the danger of over-tting? (Including too many terms) What is the problem with under-tting? (Not including enough terms)

36 / 54

Estimating Variance
Recall assumptions A1, A2, and A3:
Assumptions

For our tted model, the residuals ei = yi yi can be used to estimate Var[(x)]. An estimator for the variance is ...
Hint: See Basic Statistical Concepts and Some Probability Essentials by Justin Shows and Betsy Enstrom

37 / 54

Estimating Variance
Recall assumptions A1, A2, and A3:
Assumptions

For our tted model, the residuals ei = yi yi can be used to estimate Var[(x)]. An estimator for the variance is ...
Hint: See Basic Statistical Concepts and Some Probability Essentials by Justin Shows and Betsy Enstrom

The Sample Variance s2 = z 1 n1


n

(zi )2 z
i=1

38 / 54

Estimating Variance
Sample Variance for a rv z s2 = z 1 n1
n

(zi )2 z
i=1

The estimator for the regression problem is similar = 2 = 1 n (p + 1) SSE df


n

e2 i
i=1

where the degrees of freedom df = n (p + 1). There are p + 1 unknown parameters in the model.

39 / 54

Statistical Inference
An additional assumption

In order to calculate condence intervals (C.I.), we need a distributional assumption on (x).


Up to now, we havent needed one

The standard assumption is to assume a Normal or Gaussian distribution A4 : (x) N (0, 2 )

40 / 54

Statistical Inference
Distributions

Using
T y(xo ) = x0 + (xo ) T y(xo ) N (x0 , 2 )

= (X T X)1 X T Y where x0 is a point in design space. And the 4 assumptions, we nd


T T m(xo ) = N xo , 2 xo (X T X)1 xo T T y(xo ) = N xo , 2 (1 + xo (X T X)1 xo )

MVN X, 2 (X T X)1 From these we can nd CIs and perform hypothesis tests.
41 / 54

Model Comparison 2
R

Sum of Squares Error


n n

SSE =
i=1

(yi yi ) =
i=1

e2 = e e i

Sum of Squares Total


n

SST =
i=1

(yi )2 y

This is the model with intercept only y(x) = . y

Coefcient of Determination R2 = 1 SSE SST

R2 is a measure of how much better a regression model is than the intercept only.
42 / 54

Model 2Comparison
Adjusted R

What happens to R2 if you add more terms in the model? R2 = 1 SSE SST

43 / 54

Model 2Comparison
Adjusted R

What happens to R2 if you add more terms in the model? R2 = 1 SSE SST

Adjusted R2 penalizes by the number of terms (p + 1) in the model R2 = 1 adj SSE /(n (p + 1)) SST /(n 1) =1 SST /(n 1)

Also see residual plots, Mallows Cp , PRESS (cross-validation), AIC, etc.


44 / 54

MATLAB Demonstration
cftool

MATLAB Demo #2 Type cftool

45 / 54

Nonlinear Regression
A linear regression model can be written
p

y(x) =
k=0

k hk (x) + (x)

The mean, m(x) is a linear combination of the s

Nonlinear regression takes the general form y(x) = m(x; ) + (x) for some specied function m(x; ) with unknown parameters .

46 / 54

Nonlinear Regression
A linear regression model can be written
p

y(x) =
k=0

k hk (x) + (x)

The mean, m(x) is a linear combination of the s

Nonlinear regression takes the general form y(x) = m(x; ) + (x) for some specied function m(x; ) with unknown parameters . Example The sinusoid we looked at earlier y(x) = A sin(x + ) + (x) with parameters = (A, , ) is a nonlinear model.
47 / 54

Nonlinear Regression
Parameter Estimation

Making same assumptions as in linear regression (A1-A3), the least squares solution is still valid
n

= arg min
i=1

(yi m(xi ; ))2

Unfortunately, this usually doesnt have a closed form solution (like in the linear case) Approaches to nding the solution will be discussed later in the workshop But that wont stop us from using nonlinear (and nonparametric) regression in MATLAB!

48 / 54

Off again to cftool


MATLAB Demo #3

49 / 54

Weighted Regression
Consider the risk functions we have considered so far
n

R() =
i=1

(yi m(xi ; ))2

Each observation is equally contributes to the risk Weighted regression uses the risk function
n

Rw () =
i=1

wi (yi m(xi ; ))2

so observations with larger weights are more important. Some examples wi = 1/i2 wi = 1/xi wi = 1/yi wi = k/|ei | Robust Regression
50 / 54

Heteroskedastic (Non-constant variance)

Transformations
Sometimes transformations are used to obtain better models Transform predictors x x Transform response y y
Make sure assumptions A1-A3,A4 are still valid

Standardized x = Log

x x sx

y = log(y)

51 / 54

The Competition
Contest to see who can construct the best model in cftool Get into groups Data can be found in competition data.m Scoring will be performed on testing set Want to minimize sum of squared errors When group is ready, enter model into this computer

52 / 54

MATLAB Help
There is lots of good assistance in the MATLAB help window Specically, look at the Demos tab on the help window The Toolboxes of Statistics (Regression) and Optimization may be particularly useful for this workshop

53 / 54

Have a great workshop!

54 / 54

You might also like