Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

STAT1400 2022 1st Week4-Lecture 8

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

https://www.student.uwa.edu.

au/learning/resources/ace/
respect-intellectual-property/copyright-and-uwa-unit-content

Adriano Polpo (UWA) STAT 1400


STAT 1400 - Statistics for Science

stat1400-ems@uwa.edu.au

Contributors to lecture material: Adrian Baddeley, Adriano Polpo, John Bamberg, Ed Cripps, Julie Marsh, Kevin Murray,
Gordon Royle, and Berwin Turlach.

Adriano Polpo (UWA) STAT 1400


father/son heights
Which line is best?
Equation of a line

The equation of a line is

y = mx + c

where
m is the slope or gradient
c is the y-intercept.
Plotting lines

10

8 y= 1x + 8

6 y = 0.5x + 3

1 2 3 4 5 6 7 8 9 10
Residuals

14

12 “residual”

10

2 4 6 8 10 12 14
Method of Least Squares

In 1806, the French mathematician Adrien-Marie Legendre


proposed the method of least squares.

He suggested that the best line would be the one where

the sum of the squares of the residuals

is as small as possible.

Positive or negative deviations count equally


Large deviations count significantly more
Notation

We are given n data points

{(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )},

and want to find the line y = b0 + b1 x so that


n
X
(yi (b0 + b1 xi ))2
i=1

is as small as possible.

(Statisticians use b1 and b0 rather than m and b for the gradient


and y-intercept of the line.)
Line of best fit

So the best linear equation fitting the father/son data is

y = 86.088 + 0.514x

So the best guess for the height of the son of a 180cm father
would be
86.088 + 0.514 ⇥ 180 = 178.76.
The best line
The Anscombe Quartet

The Anscombe Quartet is a famous collection of four datasets,


each containing eleven (x, y)-pairs.

Each of the datasets has (almost) the same summary data

Each of the datasets has the same linear model.


Anscombe Quartet Scatterplots

60
Anscombe Regression Lines

Equation of line is y = 3 + 0.5x with r = 0.82 in each case


59
Terminology

In discussing linear models, it is common that:


The x-variable is called the predictor, or explanatory, variable
The y-variable is called the response variable

The line is called the fitted line and expressed in the form

ŷ = b0 + b1 x.

The “hat” notation, such as ŷ, is used throughout statistics to


denote an estimated or predicted value derived from a model.
Prediction

14

12

10

2 4 6 8 10 12 14
Are residuals errors?

For each data point (xi , yi ) we have

yˆi = b0 + b1 xi

which is the “predicted value” for yi .

So put
ei = yi yˆi
and call this the residual or error. It is the di↵erence between the
observed value and the fitted value.
Residual Plot
A residual plot is a plot whose vertical axis is the residuals, and
whose horizontal axis is the predictor variable or the fitted value.
Residual Analysis

Is there a pattern?
Are they centred around zero?
Are there any outliers?
Is the variance of the residuals roughly constant?
Satisfactory Residual Analysis

● ●

6

● ●
●●

4
● ●
● ●● ●

● ●
●● ● ●

4
● ●
● ●

Response

Response
●● ●

2
● ●


● ●

2
● ● ● ●
● ●● ● ●
● ● ● ●

0
● ●
● ● ●

0
● ●

● ● ●
● ● ● ● ●
−2
● ●● ● ●● ● ●
● ● ● ● ●

−2

● ● ●
● ●

−2 −1 0 1 2 −2 −1 0 1 2

Explanatory variable Explanatory variable


1.5

2
● ● ●
● ● ● ● ●
● ●
● ●

1
● ● ● ● ●
● ● ● ● ●
0.5

● ● ● ●
● ●
Residuals

Residuals
● ● ● ●

● ● ● ● ● ●

0
● ● ● ●
● ● ●
● ● ●

● ●● ●
−0.5

● ● ● ● ●

−1
● ● ●● ●
● ● ●
● ●

● ● ● ●
● ●
−2

● ●
−1.5

● ●

−2 0 2 4 −2 0 2 4

Fitted Values Fitted Values


Unsatisfactory Residual Analysis

10
● ●
●●

50

8

40

● ● ●

Response

Response
6
●●

30
● ●
● ●

4
● ● ●
● ●

20
● ●
● ●
● ●● ●
2 ● ● ●
● ●●
● ●

10

● ● ●●●
●● ● ●
● ● ●
●●●● ●●● ●
●●
0

●●●●●●● ●●● ●●●●●●●

0
● ●

−2 −1 0 1 2 −2 −1 0 1 2

Explanatory variable Explanatory variable

● ●

30
6

●●

● ●
4

20


Residuals

Residuals

●● ●
2


● ●

10
● ● ●
● ● ●●● ●●
0

● ● ● ●
● ● ● ● ●
● ● ●
● ● ●● ●

●●●
0
● ●●●
−2

● ● ●● ●
● ● ● ●
●● ●●● ●● ●●

●● ●●●●
−10


−4

● ●●●●

3.50 3.55 3.60 3.65 3.70 3.75 −10 −5 0 5 10 15 20 25

Fitted Values Fitted Values


More residual analysis

6
● ●

10

● ●
● ● ●
● ● ●●

4

● ●

5
● ●● ●●

● ● ●● ● ● ● ●
● ● ● ●
● ● ● ●●

2
● ● ● ● ●

0
● ● ●

Response

Response
● ● ●
● ● ●
● ● ●
● ● ● ● ●

0
● ● ●

● ●● ●
● ● ● ●

−6 −4 −2
● ● ●

−10
● ●

● ●

● ● ●

−20

● ●

−2 −1 0 1 2 −2 −1 0 1 2

Explanatory variable Explanatory variable

10 15
● ●
4

● ●
● ● ● ● ●
● ●●
●● ● ●
2

● ● ●● ● ●
● ● ● ● ● ● ● ● ●

5
● ● ● ●

0
Residuals

Residuals

● ● ● ● ● ●● ● ● ●
● ● ● ●● ●
● ● ● ●● ●●●

0
● ●
−8 −6 −4 −2

● ● ● ● ● ●
● ●
● ● ● ● ●
● ●

−5
● ●

−15 ●



● ● ● ●

−1 0 1 2 −4 −2 0 2 4

Fitted Values Fitted Values

You might also like