Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
41 views

Lecture 11 - SimplerLinear and Simple Logistic Regression

1. The document discusses simple linear regression, which models the relationship between two variables (an independent variable (X) and dependent variable (Y)) with a linear equation. 2. The method of least squares is used to determine the best fit line by minimizing the sum of the squared vertical distances between the actual data points and the regression line. 3. The regression coefficients (B values) indicate the relationship between X and Y - the y-intercept is the value of Y when X is zero, and the slope is the change in Y for a one unit change in X.

Uploaded by

Rhema
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Lecture 11 - SimplerLinear and Simple Logistic Regression

1. The document discusses simple linear regression, which models the relationship between two variables (an independent variable (X) and dependent variable (Y)) with a linear equation. 2. The method of least squares is used to determine the best fit line by minimizing the sum of the squared vertical distances between the actual data points and the regression line. 3. The regression coefficients (B values) indicate the relationship between X and Y - the y-intercept is the value of Y when X is zero, and the slope is the change in Y for a one unit change in X.

Uploaded by

Rhema
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

BIO 3211– Biometry and Biostatistics

11– Regression
-Aarif Baksh

1
Introduction

Linear Regression- An approach to


The IV is also called the
modelling the _________ between
_______________ or
two variables by fitting a linear
_________________.
equation to the _________ data.

The two types of variables are The DV is also called the


called the ______________ and _______________ or
the _______________. _________________.

2
Introduction

3
Simple Linear Regression

The most common method of


determining the best fit line is the
method of __________________.
A _________________ is a
straight line that is the best
approximation of a given set of
data. In this method, a line that makes
the _________________ of the
vertical distances of the data points
from the line as small as possible is
used.

4
Simple Linear Regression

◊ Method of Least Squares The difference between each actual value,, and
each predicted value, , is called a
_____________.

5
Simple Linear Regression

◊ Method of Least Squares


We want the residuals to be as small as possible
because ____________________________.

6
Simple Linear Regression

◊ Equation of the LS Regression Line

7
Simple Linear Regression

Question 1 (Linear Regression):

In a statistics course, we want


to see if there is any
relationship between study time
and scores in the mid-semester
exam. The scatter plot for the
data is shown. Describe the
relationship.

8
Simple Linear Regression

Question 2 (Linear Regression):

In a statistics course, we want


to see if there is any
relationship between study time
and scores in the mid-semester
exam. An output table from
SPSS is shown.

a) Write the equation of the estimated regression line.


b) Use the line to predict the exam score if a student studies for 19 hours.
9
Simple Linear Regression

◊ Interpreting Regression Coefficients

(y-intercept): The value of the _____ when the ____ is zero.


(slope): The ________ in the DV for every _______ increase in
the IV.

c) Interpret the B values from the


table.

10
Simple Linear Regression

Table 1: Descriptive Statistics

Question 3 (Linear Regression): Std.


Mean Deviation N

Rainfall_at_Bloomfield 8.3625 8.87854 8


A University student, as part of a
Rainfall_at_Johns 8.6500 10.25057 8

research collected data on the amount


of rainfall in millimetres for Johns and Table 2: Model Summary

Bloomfield, two neighbouring villages, Model R R Square Adjusted R Square Std. Error of the Estimate

for eight consecutive days in May 2016. 1 .997a .994 .993 .72151

a. Predictors: (Constant), Rainfall_at_Johns


She is interested in determining if the
Table 3: Coefficientsa
amount of rainfall at Johns can be used
Standardized

to predict the amount of rainfall at Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.


Bloomfield. She used SPSS to perform
1 (Constant) _____________ .344 2.595 .041

a regression analysis and the output Rainfall_at_Johns .864 .027 .997 32.465 .000

tables of the results are shown. a. Dependent Variable: Rainfall_at_Bloomfield 11


Simple Linear Regression

Table 1:
Table 1: Descriptive
Descriptive Statistics
Statistics

Std.
Std.

Question 3 (Linear Regression): Mean


Mean Deviation
Deviation N
N

Rainfall_at_Bloomfield
Rainfall_at_Bloomfield 8.3625
8.3625 8.87854
8.87854 88

Rainfall_at_Johns
Rainfall_at_Johns 8.6500
8.6500 10.25057
10.25057 88
a) Calculate the value of the blank space in Table 3.
b) Write the equation of the estimated regression Table 2:
Table 2: Model
Model Summary
Summary

line of Y on X and use the equation to determine the


Model
Model R
R R Square
R Square Adjusted R
Adjusted R Square
Square Std. Error
Std. Error of
of the
the Estimate
Estimate
amount of rainfall at Bloomfield when the amount of
11 .997aa
.997 .994
.994 .993
.993 .72151
.72151
rainfall at Johns is 8.5 mm. a. Predictors:
Predictors: (Constant),
(Constant), Rainfall_at_Johns
Rainfall_at_Johns
a.

c) Is it safe to say that a strong linear relationship Table 3:


Table 3: Coefficients
Coefficientsaa

exist between the rainfall at Johns and the rainfall Standardized


Standardized
Unstandardized Coefficients
Unstandardized Coefficients Coefficients
Coefficients
at Bloomfield? Explain.
Model
Model B
B Std. Error
Std. Error Beta
Beta tt Sig.
Sig.
d) Interpret the .994 in Table 2.
11 (Constant)
(Constant) _____________
_____________ .344
.344 2.595
2.595 .041
.041

e) Interpret the .864 in Table 3. Rainfall_at_Johns


Rainfall_at_Johns .864
.864 .027
.027 .997
.997 32.465
32.465 .000
.000

a. Dependent
a. Dependent Variable:
Variable: Rainfall_at_Bloomfield
Rainfall_at_Bloomfield 12
Simple Linear Regression

13
Simple Linear Regression

Question 4 (Linear Regression):

Consider the data shown in the


table.

^𝑦 =0.03456+0.9450 𝑥 Use pizza/subway fare data to


predict subway fare when pizza
costs $2.25. The scatterplot,
equation, correlation
coefficient and p-value for
model significance.

14
Simple Linear Regression

Question 5 (Linear Regression):

Consider the data shown in the


table.
Use the runs/subway fare data
^𝑦 =1.724−0.01138 𝑥 to predict subway fare when 33
runs are scored in the World
Series: The scatterplot,
equation, correlation coefficient
and p-value for model
significance.

15
Simple Linear Regression

Question 6 (Linear Regression):

Shown to the left, are output tables for a regression


done using SPSS. Use the table to answer the following
questions.
a) Which value tells us that there is a moderate
positive relationship between the predictor and the
outcome variables?
b) Which value tells us that the regression model is
significant? According to this value, is the model
significant?
c) As a result of b) above, what is the best predictor of
y? (The model or ?)
16
Simple Linear Regression

◊ Requirements (Assumptions) of Simple Linear Regression

1. The sample is a random sample.

2. The true relationship is linear.

3. Residuals (Errors) are normally distributed.

4. Homoscedasticity of residuals (or, equal variance around the


line).

5. The Y- values or the errors are independent.

17
Simple Linear Regression

◊ Linearity

18
Simple Linear Regression

◊ Normality

19
Simple Linear Regression

◊ Analyzing Residuals (Residual by Predicted Plot)

20
Simple Linear Regression

◊ Analyzing Residuals (Residual by Predicted Plot)

21
Simple Logistic Regression

Simple (Binary) Logistic Regression-


An approach to modelling the Some problems with using OLS with
relationship between a
________________ DV and an IV a dichotomous DV.
1. The assumption of OLS is that
Some examples of dichotomous the DV is ____________
variables are _____________,
_______________ and instead of dichotomous.
_________________. 2. The assumption of OLS is that a
_______________ relation
A success is usually coded as _____ exists between the IV and the
and a failure is usually coded as DV.
_____.
22
Simple Logistic Regression

Some problems with using OLS with


a dichotomous DV.
1. The assumption of OLS is that
the DV is ____________
instead of dichotomous.
2. The assumption of OLS is that a
_______________ relation
exists between the IV and the
DV.
23
Simple Logistic Regression

Some problems with using OLS with


a dichotomous DV.
3. The assumption of OLS is that
the error term is ___________
distributed.
4. The assumption of OLS is that
the variance of the error term is
____________ across all values
of the independent variables.
24
Simple Logistic Regression

Question 4 (Binary Logistic Regression):

The graph shows a logistic model for the


relationship between the weight of a
mouse and whether or not the mouse is
obese. (0 = is not obese, 1= obese).
Describe the relationship.

25
Simple Logistic Regression
If Odds(A) is or 1:5, we interpret
this as there A occurs once for
◊ Odds and Odds Ratio every five times it does not occur
OR We expect A to occur once in 6
The odds of an event is defined as trials.
the ratio of the probability that
_______________ to the
probability that ______________.
Question 1 (Odds):
What is the odds of:

𝑜𝑑𝑑𝑠 ( 𝐴 )=¿
a) Obtaining a head when a fair coin is
tossed once? Interpret the result.
b) Obtaining an even number when a fair
die is rolled? Interpret the result
26
Simple Logistic Regression

If Odds ratio , we interpret this as the


◊ Odds and Odds Ratio odds of getting the event happening
under condition A is 5 times greater than
The odds ratio is the ratio of two the event happening under condition B.
odds.

𝑜𝑑𝑑𝑠𝑟𝑎𝑡𝑖𝑜=¿ Question 2 (Odds Ratio):


The odds that a patient develops a
headache using an old treatment is and
Odds ratios are frequently used to the odds developing a headache under
express the relative chance of an the new treatment is . Calculate and
event happening under two
different conditions. interpret the odds ratio.
27
Simple Logistic Regression

If Odds ratio , we interpret this as the


◊ Odds and Odds Ratio odds of getting the event happening
under condition A is 5 times greater than
the event happening under condition B.

Question 3 (Odds Ratio):


The probability of obtaining a head on a
biased coin is .
a) Calculate the odds ratio of obtaining a
head on when the biased coin is tossed
compared to when a fair coin is tossed
and interpret the odds ratio. 28
Simple Logistic Regression

◊ The Logit Transformation

Logistic Regression uses the logit


transformation to ________ the
relationship between the IV and the
DV.

𝛽 0+ 𝛽 1 𝑥
^= 𝑒
𝑝 𝛽 + 𝛽1 𝑥
1 +𝑒 0

29
Simple Logistic Regression

Question 5 (Binary Logistic Regression):

Given that in a binary logistic model, and .


a) Write equation of the logit model.
b) What is if .
c) Write the equation of the estimated probability model.
d) What is the probability of a success if ?

30
Simple Logistic Regression

Question 6 (Binary Logistic Regression):

Does Motivation and Referral source affect whether a client completes a drug and alcohol
treatment program? A researcher investigates the above question. Motivation is a continuous
variable and Referral is dichotomous ( agency referral, self referral). The outcome is coded
complete the program and do not complete the program. The SPSS output is shown below.

31

You might also like