Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
12 views

Linear Regression

Uploaded by

Getachew Desalew
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Linear Regression

Uploaded by

Getachew Desalew
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

University of Gondar

College of medicine and health science


Department of Epidemiology and
Biostatistics

Linear regression

Haileab F. (BSc., MPH)


Scatter Plots and Correlation
 Before trying to fit any model it is better to see its scatter plot

 A scatter plot (or scatter diagram) is used to show the


relationship between two variables
 If a scatter plot once show some sort of linear relationship, we
can use correlation analysis to measure the strength of linear
relationship between two variables
o Only concerned with strength of linear relationship and its
direction
o We consider the two variables equally; as a result no causal
effect is implied
Scatter Plot Examples
Linear relationships Curvilinear relationships
y y

x x

y y

x x
Scatter Plot Examples
Weak relationships
Strong relationships

y y

x x

y y

x x
Scatter Plot Examples
No relationship at all
y

x
Correlation Coefficient

 The population correlation coefficient ρ (rho) measures


the strength of the association between the variables

 The sample correlation coefficient r is an estimate of ρ


and is used to measure the strength of the linear
relationship in the sample observations
Features of ρand r

 Unit free

 Range between -1 and 1

 The closer to -1, the stronger the negative linear


relationship

 The closer to 1, the stronger the positive linear relationship

 The closer to 0, the weaker the linear relationship


Examples of Approximate r Values

y y y

x x x
r = -1 r = -.6 r=0

y y

r = +0.3 x r = +1 x
Calculating the Correlation Coefficient
Sample correlation coefficient:

r
 ( x  x )( y  y )  SS xy / SS SS
xx yy
[ ( x  x ) ][ ( y  y ) ]
2 2

or the algebraic equivalent:


n xy   x  y
r
[ n( x 2 )  ( x ) 2 ][n( y 2 )  ( y ) 2 ]

where:
r = Sample correlation coefficient
n = Sample size
x = Value of the ‘independent’ variable
y = Value of the ‘dependent’ variable
Example
Child Child
Height Weight
(cm) (Kg)
x y xy x2 y2
35 8 280 1225 64
49 9 441 2401 81
27 7 189 729 49
33 6 198 1089 36
60 13 780 3600 169
21 7 147 441 49
45 11 495 2025 121
51 12 612 2601 144
=321 =73 =3142 =14111 =713
Calculation Example
Child
Height,
y
70 n xy   x  y
r
60
[n(  x 2 )  (  x)2 ][n(  y 2 )  (  y)2 ]
50

40 8(3142)  (73)(321)

30
[8(713)  (73)2 ][8(14111)  (321) 2 ]
20

10  0.886
0
0 2 4 6 8 10 12 14

Child weight, x r = 0.886 → relatively strong positive


linear association between x and y
Significance Test for Correlation
 Hypotheses
H0: ρ = 0 (no correlation)
HA: ρ ≠ 0 (correlation exists)
r
t 
 Test statistic 1 r 2 (with n – 2 degrees of freedom)
n2

Here, the degree of freedom is taken to be n-2


because, two points can be joined by a straight line
surely
Example:
Is there evidence of a linear relationship between child
height and weight at the 0.05 level of significance?

H0: ρ = 0 (No correlation)


H1: ρ ≠ 0 (correlation exists)
 = 0.05 , df = 8 - 2 = 6

r .886
t   4.68
1 r 2 1  .886 2
n2 82
Introduction to Regression Analysis
 Regression analysis is used to:
 Predict the value of a dependent variable based on the value of
at least one independent variable
 Explain the impact of changes in an independent variable on the
dependent variable

 Dependent variable: the variable we wish to explain. In linear


regression it is always continuous variable

 Independent variable: the variable used to explain the dependent


variable. In linear regression it could have any measurement scale.
Simple Linear Regression Model

 Only one independent variable, x

 Relationship between x and y is described


by a linear function.

 Changes in y are assumed to be caused by


changes in x
Population Linear Regression
The population regression model:
Dependent Population Population Random
Variable Independent Error
y intercept Slope
Variable term, or
Coefficient
residual

y  β0  β1x  ε
Linear component Random Error
component
Population Linear Regression
y y  β0  β1x  ε
Observed
Value of y
for xi
εi Slope = β1
Predicted Random Error
Value of y for this x value
for xi

Intercept
= β0
xi x
Estimated Regression Model
The sample regression line provides an estimate of the
population regression line

Estimated Estimate of Estimate of the


(or predicted) the regression regression slope
y value
intercept
Independent

ŷ i  b0  b1x variable

The individual random error terms ei have a mean of zero


Interpretation of the Slope and the Intercept

 b0 is the estimated average value of y when the value of x


is zero (provided that x is inside the data range considered).
 Otherwise it shows the portion of the variability of the
dependent variable left unexplained by the independent
variables considered

 b1 is the estimated change in the average value of y as a


result of a one-unit change in x
Example: Simple Linear Regression
 A researcher wishes to examine the relationship between
the amount of the daily average diets taken by a cohort of
20 sample children and the weight gained by them in one
month (both measured in kg). The content of the food is
the same for all of them.

 Dependent variable (y) = weight gained in one month


measured in kilogram

 Independent variable (x) = average weight of diet taken per


day by a child measured in Kilogram
Sample Data for child weight Model
Diet (x) Weight gained (y) Diet (x)
Weight gained (y)

0.4 0.65 0.86 1.1


0.46 0.66 0.89 1.12
0.55 0.63 0.91 1.20
0.56 0.73 0.93 1.32
0.65 0.78 0.96 1.33
0.67 0.76 0.98 1.35
0.78 0.72 1.02 1.42
0.79 0.84 1.04 1.1
0.80 0.87 1.08 1.5
0.83 0.97 1.11 1.3
Regression Using SPSS
Analyze/ regression/linear….
Coefficients

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.


(Constant) 0.160 .077 2.065 .054

foodweight 0.643 .073 .900 8.772 .000

Weight gained = 0.16 +0.643(food weight)


Interpretation of the Intercept, b0

Weight gained = 0.16 + 0.643(food weight)

 Here, no child had had 0 kilogram of food per day, so for


foods within the range of sizes observed, 0.16Kg is the
portion of the weight gained not explained by food.

 Whereas, b1 = 0.643 tells us that the average weight of a


child increases by 0.643, on average, for each additional
one kilogram of food taken each day
Multiple linear regression

Multiple Linear Regression (MLR) is a


statistical method for estimating the
relationship between a dependent variable and
two or more independent (or predictor)
variables.

Function: Ypred = Bo + B1X1 + B2X2 +… + BnXn


Multiple Linear Regression

Simply, MLR is a method for studying the


relationship between a dependent variable
and two or more independent variables.
Purposes:
Prediction
Explanation
Theory building
Variations

Total Variation in Y Predictable variation by


the combination of
independent variables

Unpredictable
Variation
Multiple regression
%fat age Sex

9.5 23.0 0.0


Example: 27.9 23.0 1.0
7.8 27.0 0.0
17.8 27.0 0.0
Regress the percentage of fat relative 31.4 39.0 1.0
25.9 41.0 1.0
to body on age and sex 27.4 45.0 0.0
25.2 49.0 1.0
31.1 50.0 1.0
34.7 53.0 1.0
42.0 53.0 1.0
42.0 54.0 1.0
SPSS result on the next slide! 29.1 54.0 1.0
32.5 56.0 1.0
30.3 57.0 1.0
21.0 57.0 1.0
33.0 58.0 1.0
33.8 58.0 1.0
41.1 60.0 1.0
34.5 61.0 1.0
Model Summary
Change Statistics
R Adjusted R Std. Error of R Square Sig. F
Model R Square Square the Estimate Change F Change df1 df2 Change
1 .729a .532 .506 6.5656 .532 20.440 1 18 .000
2 .794b .631 .587 5.9986 .099 4.564 1 17 .047
a. Predictors: (Constant), sex; b. Predictors: (Constant), sex, age
ANOVA
Model Sum of Squares df Mean Square F Sig.
1 Regression 881.128 1 881.128 20.440 .000a
Residual 775.932 18 43.107
Total 1657.060 19
2 Regression 1045.346 2 522.673 14.525 .000b
Residual 611.714 17 35.983
Total 1657.060 19
a. Predictors: (Constant), sex; b. Predictors: (Constant), sex, age; c. Dependent Variable: %age of body fat

Coefficients
Standardized 95% Confidence
Unstandardized Coefficients Coefficients Interval for B
Lower Upper
Model B Std. Error Beta t Sig. Bound Bound
1 (Constant) 15.625 3.283 4.760 .000 8.728 22.522
sex 16.594 3.670 .729 4.521 .000 8.883 24.305
2 (Constant) 6.209 5.331 1.165 .260 -5.039 17.457
sex 10.130 4.517 .445 2.243 .039 .600 19.659
age .309 .145 .424 2.136 .047 .004 .614
a. Dependent Variable: %age of body fat relative to body
 Write the model for the output.
 Interpret the findings.
Thank you!!

You might also like