0% found this document useful (0 votes)

74 views

Regression Lecture Summary

This document discusses regression analysis and how it can be used to predict the value of a dependent variable (y) based on the value of an independent variable (x). It provides an example of using regression to predict the size of a household (y) based on the pounds of plastic discarded (x). The regression equation derived from the data is y=0.549+1.480(x). Therefore, the best prediction for a household discarding 0.50 lbs of plastic is 1.289 people or 1 person.

Uploaded by

abrham abageda

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views

Regression Lecture Summary

Uploaded by

abrham abageda

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

Regression

Review Lecture
Major Points
• Is there a relationship between x and y?
• What is the strength of this relationship
• Pearson’s r
• Can we describe this relationship and use this to predict y
from x?
• Regression
• Is the relationship we have described statistically
significant?
• t test
The relationship between x and y
• Correlation: is there a relationship between 2
variables?
• Regression: how well a certain independent variable
predict dependent variable?
• CORRELATION  CAUSATION
• In order to infer causality: manipulate independent
variable and observe effect on dependent variable
Regression
• How well a set of data points fits a straight line can
be measured by calculating the distance between
the data points and the line.
• The total error between the data points and the
line is obtained by squaring each distance and then
summing the squared values.
• The regression equation is designed to produce the
minimum sum of squared errors.

4
Regression
• Is the statistical technique for finding the best-
fitting straight line for a set of data.
• To find the line that best describes the relationship
for a set of X and Y data.
Regression Analysis
• Question asked: Given one variable, can we predict
values of another variable?
•
• Examples: Given the weight of a person, can we
predict how tall he/she is; given the IQ of a person,
can we predict their performance in statistics; given
the basketball team’s wins, can we predict the
extent of a riot. ...
Regression line
• makes the relationship between variables easier to
see.
• identifies the center, or central tendency, of the
relationship, just as the mean describes central
tendency for a set of scores.
• can be used for a prediction.
The Equation for a Line
Y = bX + a

• b = the slope
• a = y-intercept
• Y= predicted value
Regression
• The mathematical equation for a line:
Y = mx + b
Where: Y = the line’s position on the vertical axis at any point
X = the line’s position on the horizontal axis at any point
m = the slope of the line

b = the intercept with the Y axis, where X equals zero

Regression
• The statistics equation for a line:
Y = a + bx
Where: Y = the
^ line’s position on the vertical axis at any point (estimated
value of dependent
^ variable)
X = the line’s position on the horizontal axis at any point (value of
the independent variable for which you want an estimate of Y)
b = the slope of the line (called the coefficient)

a = the intercept with the Y axis, where X equals zero

Regression
• R2
• Is the improvement obtained by using X (and drawing a line through
the conditional means) in getting as near as possible to everybody’s
value for Y over just using the mean for Y alone.
• Falls between 0 and 1
• 1 means an exact fit (and there is no variation of scores around
the regression line)
• 0 means no relationship (and as much scatter around the line as
in the original Y variable and a flat regression line (slope = 0)
through the mean of Y)
• Would be the same for X regressed on Y as for Y regressed on X
• Can be interpreted as the percentage of variability in Y that is
explained by X.
• Some people get hung up on maximizing R2, but this is too bad because
any effect is still a finding—a small R2 only indicates that you haven’t told
the whole (or much of the) story of the relationship between your
variables.
Correlation and Regression
Back to the SPSS output:

r2

 (Y – Y)2 -  (Y – Y)2
 (Y – Y)2

71.194 ÷ 154.64 = .460

Correlation and Regression
Back to the SPSS output:

Of course, you get

the standard error
and
t on your output,
…and the p-value
too!
Correlation and Regression
Our data’s correlation is .679. How strong is that?

Correlation, r,
is significant.
Regression Example: Answer this question using Regression.
What is the best predicted size of a household that discard 0.50 lb of plastic?

Before you get too excited about this output, let’s cross-off the info that we
are not going to discuss or learn about in class. I’m only trying to give you an
elementary exposure to Regression. The following slides will show you the
things you need to understand and each of those items will be explained.
Regression Example: Answer this question using Regression.
What is the best predicted size of a household that discard 0.50 lb of plastic?

Info to help you understand Regression Info that you MUST know for test

Multiple R has no importance by itself, it is the

square of this value “R square” that is important

“R Square” is also known as the “Coefficient of

Determination” and represents the amount (percent)
of the variance in y that is explained by x. You can
sort-of say that this value shows how accurate or
effective our regression equation is. In this case, our
regression equation is 71% accurate or effective in
predicting y, given some value of x.

These coefficients form the Regression Equation. In

this case y=0.549+1.480(x). Thus, the answer to our
question is y=0.549+1.480(0.50)=1.289 people (or 1
rounded to the nearest whole person in a household
that tosses 0.50 lbs. of plastic)
Regression Example: Answer this question using Regression.
What is the best predicted size of a household that discard 0.50 lb of plastic?

Info to help you understand Regression Info that you MUST know for test

“Adjusted R Square” is the same as “R Square”

EXCEPT that it is adjusted for sample size. When the
sample size (n) is small, this value will be quite a bit
below the unadjusted R Square value. As the sample
size increases, the difference between R Square and
Adjusted R Square becomes negligible. Adjusted R
Square is a more accurate representation of the
accuracy or effectiveness of our regression equation.

Standard Error is an estimate of how far off our y

predictions will be (on average) using the regression
Observations equation.
is simply n

Here we simply see that Excel determines the

appropriate degrees of freedom – nothing for us to
do here
Regression Example: Answer this question using Regression.
What is the best predicted size of a household that discard 0.50 lb of plastic?

Info to help you understand Regression Info that you MUST know for test

Significance F shows the alpha level that separates

the regression equation between being statistically
significant and NOT. Thus, if this value is LESS
THAN our alpha value (usually 0.05) then the
regression equation is statistically significant (i.e.,
a low chance of error in predicting y). Conversely,
if this value is greater than our alpha then the
SS requires some
regression equation is not statistically significant
explanation – see the
(i.e., a relatively high chance of error in predicting
next couple of slides
y). In this case, since Significance F is much less
than 0.05 we conclude that the regression equation
is statistically significant.

You may remember the F test from the section on

hypothesis testing. Excel does an F test to see if the
regression equation is any better than simply using the
average y as the predictor of y. A large F value generally
suggests that the regression equation is effective, but we
rely on the “Significance F” to tell for sure
Regression Example: Answer this question using Regression.
What is the best predicted size of a household that discard 0.50 lb of plastic?

SS (Sum of Squares) NOT REQUIRED FOR

YOU TO KNOW
SS Regression (Explained Deviation) – This is
calculated by first finding the vertical distance
between the regression line and the mean y value
(y-bar) at each data point. Next, those individual
differences are squared and then all added together
(summed) to form the SS (sum of squares)
Regression. In practical terms, in order for the
regression equation to be valid, we should see a
y
relatively high value here as compared to SS
20
19
• Residual.
18
17 Unexplained deviation SS Residual (Unexplained Deviation) – This is
16
Total
(Residual) calculated by first finding the vertical distance
15 (y – y-hat)
14 deviation between each data point and the regression line.
13 (y – y-bar) • Explained deviation Since we hope our regression line explains the
12
(Regression)
11
10 (y-hat – y-bar)
relationship between x and y, any difference
9
• between the line and a data point is “unexplained”.
8
7
y-bar = 9 We hope this value is relatively small compared to
6
5
SS Regression.
4 y-hat = 3 + 2x SS Total – This is simply SS Regression + SS
3
2 Residual. It can also be calculated by finding the
1
0 x vertical distance between each data point and y-
0 1 2 3 4 5 6 7 8 9
bar, squaring each value and added them all up.
Multiple Regression
Multiple Regression
• Multiple Regression is a process for defining a linear
relationship between a dependent variable y and two or
more independent variables (x1, x2, x3 . . . , xk)
• The linear equation for a regression problem in which we
have multiple x variable is as follows, where b 0 is the y-
intercept and all the other b’s are coefficients associated
with their respective x values:

yˆ  b0  b1 x1  b2 x2  ...  bk xk
Multiple Regression Guidelines
• More x variables is NOT necessarily better
• Remember that R Square is a measure of how effective
our regression equation is. Therefore, if adding an x
variable does not appreciably increase the R Square value,
then DON’T add it
• Use those x variables (the fewest possible) that give you
the biggest R Square (or Adjusted R Square) value. We
want efficiency so a few variables that provide a big R
Square is best
Multiple Regression Example: Using the following data
(measurements taken from Bears that had been anesthetized), construct a
multiple regression equation to predict the weight of Bears.
Step 1: Construct a “Correlation Matrix” to see which x variables have the strongest
linear relationships with the y variable (weight). Use the Excel function Tools, Data
Analysis, Correlation to construct a correlation matrix. An Excel file containing this
Bear data and Correlation Matrix are on the class website (mrbear.xls).
Multiple Regression Example: Using the following data
(measurements taken from Bears that had been anesthetized), construct a
multiple regression equation to predict the weight of Bears.

Step 1: Construct a “Correlation Matrix” to see which x variables have the strongest
linear relationships with the y variable (weight)
• Ideally, we want to pick those few x variables that have strong correlations (close to
-1 or +1) with the y variable, BUT we also want the x variables to NOT be highly
correlated with each other
• The addition of an x variable that is strongly correlated with any x variable(s)
already in a multiple regression equation WILL NOT do much to increase the R
Squared or Adjusted R Square value
• On the other hand, adding an x variable that is strongly correlated with the y
variable, but NOT with any x variables already in the regression equation WILL
increase our R Squared value substantially
Multiple Regression Example: Using the following data
(measurements taken from Bears that had been anesthetized), construct a
multiple regression equation to predict the weight of Bears.

Choosing which x variables to include in a multiple regression problem is often a

subjective decision. I suggest you try whichever variables you think will work best.
For this example, I am going to choose Neck size as my first x variable to predict y
(weight) since it has an awesome correlation of 0.971 with Weight. To pick a second
x variable I want one that is highly correlated with Weight, but NOT with Neck.
Unfortunately, it looks like all the other x variables ARE highly correlated with Neck.
In fact, if this were a real problem we would probably stick with simple regression
and just use Neck to predict Weight. But, since we have to do a multiple regression
problem I think I’ll pick Age as my second x variable. Age is not that highly
correlated with Weight (0.814), but it is also the least highly correlated with Neck
(0.906). Why don’t you try the problem with other x variables and see if you can beat
my Adjusted R Square value.
Multiple Regression Example: Using the following data
(measurements taken from Bears that had been anesthetized), construct a
multiple regression equation to predict the weight of Bears.

 For Multiple Regression we use the same

Tools, Data Analysis, Regression function in
Excel that we used for simple Regression.
 The “Y Range” is still simply the range of
cells that contains the y-variable
The “X Range” is the range of cells that
contains ALL the x-variables we want to
include in the regression model. MAKE
SURE the x-variables are in adjacent
columns (you cannot skip columns). Notice
how I now have Neck and Age right next to
each other.
 I like to include the column headings or
labels, so I checked the “labels” box. When
you do this the output also includes the
labels, making it a lot easier to interpret.
Everything else is the same as when we
did simple Regression
Multiple Regression Example: Using the following data
(measurements taken from Bears that had been anesthetized), construct a
multiple regression equation to predict the weight of Bears.

Again, before we analyze the output, let’s cross-off the info that we are not
going to discuss or learn about in class. The following slides will explain
those things you need to understand.
Multiple Regression Example: Using the following data
(measurements taken from Bears that had been anesthetized), construct a
multiple regression equation to predict the weight of Bears.

Just a note: I ran another regression with

Neck size as the only x variable. You can
see the Adjusted R Square is 0.9330 as
compared to the Adjusted R Square of
0.9536 that we got when we included Age
in addition to Neck. I don’t think I would
consider the slight increase in Adjusted R
Square from including Age as really being
worth the trouble of including another
variable. There is no set guideline as to
how much R Square should increase to
justify adding another variable, but going
from an already high R Squared to a
slightly higher value does not seem
worthwhile in my opinion
Multiple Regression Example: Using the following data
(measurements taken from Bears that had been anesthetized), construct a
multiple regression equation to predict the weight of Bears.

Info to help you understand Regression Info that you MUST know for test

Multiple R – same story as before

Almost the same story as in simple Regression - “R Square” or

the “Coefficient of Determination” represents the amount
(percent) of variance in y that is explained by the x’s. In this
case our regression equation is 97% accurate or effective in
predicting y (Weight). Adjusted R Square simply adjusted the
R Square value for the number of x variables and the sample
size. Lots of variables and/or small sample sizes reduce the
Adjusted R Square. A few good variables and a large sample
size bring Adjusted R Square very close to R Square.

These coefficients form the Regression Equation. In

this case y=-307.817+26.388(Neck)-1.527(Age).
Thus, if we found a bear with a Neck of 25” and an
Age of the 48 months we could predict its Weight as
y=-307.817+26.388(25)-1.527(48)=278.6 lbs.
Multiple Regression Example: Using the following data
(measurements taken from Bears that had been anesthetized), construct a
multiple regression equation to predict the weight of Bears.

Info to help you understand Regression Info that you MUST know for test

Like simple Regression, if the

Significance F is less than alpha then
our regression equation is statistically
These significant

Same general comments as simple

Regression. Note that MS is SS/df and the F
statistic is calculated as MS Regression / MS
Residual (just in case you are curious)

See next slide for

explanation of these
important values
Multiple Regression Example: Using the following data
(measurements taken from Bears that had been anesthetized), construct a
multiple regression equation to predict the weight of Bears.

Info to help you understand Regression Info that you MUST know for test

Excel calculates a t test statistic for each of the x variables (ignore the t Stat for Intercept).
The next column (P-value) is the important one because it provides the test result of this
test statistic as compared to the critical value. The critical values are not shown, but we
don’t need to see them because the P-value effectively tells us whether this test statistic is
inside of outside the critical value (see P-value for more explanation)

The P-value tells us whether the x variable is statistically significant in the regression equation.
If the P-value is less than alpha (usually 0.05) then that x variable is a significant contributor in
the regression equation. If the P-value is greater than alpha then that x variable does not
contribute significantly to the regression equation. In this case, Neck has a P-value much less
than 0.05, so we see that Neck size contributes significantly to our regression equation to predict
a Bear’s weight. On the other hand, Age has a P-value greater than 0.05, so we see that adding
the Age x variable into the equation was not a good idea since Age does not significantly help us
predict a Bear’s weight in our regression equation.

Inferensi Disekitar Mean Dan Pos Hoc-Zahro
No ratings yet
Inferensi Disekitar Mean Dan Pos Hoc-Zahro
11 pages
Mid Term
No ratings yet
Mid Term
7 pages
Combining Scores Multi Item Scales
No ratings yet
Combining Scores Multi Item Scales
41 pages
Development of The Grit Scale For Children and Adults and Its Relation To Student Efficacy, Test Anxiety, and Academic Performance
No ratings yet
Development of The Grit Scale For Children and Adults and Its Relation To Student Efficacy, Test Anxiety, and Academic Performance
10 pages
4-Z-Test (Two-Sample Mean Test)
No ratings yet
4-Z-Test (Two-Sample Mean Test)
7 pages
Group Theory
No ratings yet
Group Theory
75 pages
Tests of Significance and Measures of Association
No ratings yet
Tests of Significance and Measures of Association
21 pages
Wilcoxon Signed-Ranks Test
No ratings yet
Wilcoxon Signed-Ranks Test
16 pages
Epsc 123 Statistical Methods in Edc
100% (1)
Epsc 123 Statistical Methods in Edc
34 pages
8) Multilevel Analysis
No ratings yet
8) Multilevel Analysis
41 pages
Statistics: Organize Understand
No ratings yet
Statistics: Organize Understand
9 pages
Chapter 1 at BULLET Statistics Chapter 1
No ratings yet
Chapter 1 at BULLET Statistics Chapter 1
1,100 pages
What Is Hypothesis Testing
100% (1)
What Is Hypothesis Testing
32 pages
Statistical Questions for Practice exercises
No ratings yet
Statistical Questions for Practice exercises
7 pages
Hypothesis Testing Quiz
No ratings yet
Hypothesis Testing Quiz
2 pages
Chap 15 Web Site
100% (1)
Chap 15 Web Site
8 pages
Chi Square Statistics
100% (1)
Chi Square Statistics
7 pages
Tests of Significance Notes PDF
No ratings yet
Tests of Significance Notes PDF
12 pages
Statistics For Engineers (MAT2001) - Syllabus
67% (3)
Statistics For Engineers (MAT2001) - Syllabus
3 pages
Introduction To Health Research Course Outline
100% (1)
Introduction To Health Research Course Outline
3 pages
Stat 138 Course Syllabus
No ratings yet
Stat 138 Course Syllabus
4 pages
Phi Coefficient
No ratings yet
Phi Coefficient
1 page
Validity & Realibility
No ratings yet
Validity & Realibility
13 pages
Ges 206 New Lecture Note - Dock
No ratings yet
Ges 206 New Lecture Note - Dock
84 pages
Anu Selection Criteria
100% (1)
Anu Selection Criteria
2 pages
Hypothesis 12
No ratings yet
Hypothesis 12
8 pages
Edu 7102 Quantitative Research Methods
No ratings yet
Edu 7102 Quantitative Research Methods
3 pages
Sample Thesis Using Regression Analysis
100% (5)
Sample Thesis Using Regression Analysis
6 pages
Classical Test Theory Vs Item Response Theory
No ratings yet
Classical Test Theory Vs Item Response Theory
42 pages
Basic Probability
No ratings yet
Basic Probability
70 pages
Irt vs. CTT
No ratings yet
Irt vs. CTT
4 pages
Research Skills Complete
No ratings yet
Research Skills Complete
15 pages
Instructions: HMEMS80 2021 Assignment 01 DUE DATE: 14 May 2021 Unique No: 853279
100% (1)
Instructions: HMEMS80 2021 Assignment 01 DUE DATE: 14 May 2021 Unique No: 853279
7 pages
Statistical Tools
No ratings yet
Statistical Tools
26 pages
Fundamental of Research 2 ملخص
No ratings yet
Fundamental of Research 2 ملخص
85 pages
Correlation, Correlational Studies, and Its Methods: Mariah Zeah T. Inosanto, RPM
No ratings yet
Correlation, Correlational Studies, and Its Methods: Mariah Zeah T. Inosanto, RPM
39 pages
Inferential Statistics Powerpoint
No ratings yet
Inferential Statistics Powerpoint
65 pages
Lesson 1
No ratings yet
Lesson 1
20 pages
1 Normal Distribution
No ratings yet
1 Normal Distribution
34 pages
CN Chap05 - Equilibrium of A Rigid Body
No ratings yet
CN Chap05 - Equilibrium of A Rigid Body
13 pages
Concept of Research Design
No ratings yet
Concept of Research Design
157 pages
Bibliography of Research Synthesis and Meta Analysis
No ratings yet
Bibliography of Research Synthesis and Meta Analysis
37 pages
TIMSS Assessment Framework Outline (Group 3)
No ratings yet
TIMSS Assessment Framework Outline (Group 3)
7 pages
IE241 Hypothesis Testing
100% (1)
IE241 Hypothesis Testing
30 pages
Identifying Types of Variables
No ratings yet
Identifying Types of Variables
5 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
5 pages
Exercise 2 - PM 299
No ratings yet
Exercise 2 - PM 299
6 pages
MODULE 2 - Descriptive Statistics
No ratings yet
MODULE 2 - Descriptive Statistics
8 pages
Bloom's Taxonomy For Teachers
No ratings yet
Bloom's Taxonomy For Teachers
90 pages
IGNOU Stats Inference Chi Square Block 7 PDF
No ratings yet
IGNOU Stats Inference Chi Square Block 7 PDF
22 pages
Obe Nce 311
No ratings yet
Obe Nce 311
12 pages
2 - Module 1 - Descriptive Statistics - Frequency Tables, Measure of Central Tendency & Measures of Dispersion
No ratings yet
2 - Module 1 - Descriptive Statistics - Frequency Tables, Measure of Central Tendency & Measures of Dispersion
21 pages
poLCA An R Package For Polytomous Variable Latent
No ratings yet
poLCA An R Package For Polytomous Variable Latent
29 pages
Qualitative and Quantitative Data Analysis Approaches
No ratings yet
Qualitative and Quantitative Data Analysis Approaches
2 pages
Interview Skills
No ratings yet
Interview Skills
5 pages
Data Analysis Chp9 RM
No ratings yet
Data Analysis Chp9 RM
9 pages
Reference - APA 7th Edition 2019
No ratings yet
Reference - APA 7th Edition 2019
2 pages
Approach To The Research Projects
No ratings yet
Approach To The Research Projects
7 pages
Regression: Leech N L, Barret K C & Morgan G A (2011)
No ratings yet
Regression: Leech N L, Barret K C & Morgan G A (2011)
35 pages
Lecture 4 - Correlation and Regression
No ratings yet
Lecture 4 - Correlation and Regression
35 pages
Correlation
No ratings yet
Correlation
29 pages
6013B0519Y T2 Homework Questions 20240424
No ratings yet
6013B0519Y T2 Homework Questions 20240424
7 pages
Saligumba Assignment
No ratings yet
Saligumba Assignment
24 pages
Jurnal Andry Analisis Faktor
No ratings yet
Jurnal Andry Analisis Faktor
38 pages
GV10 Baseball
No ratings yet
GV10 Baseball
49 pages
pseudocode-to-predict-stock-prices
No ratings yet
pseudocode-to-predict-stock-prices
3 pages
Ap7101 Advanced Digital Signal Processing
No ratings yet
Ap7101 Advanced Digital Signal Processing
1 page
CH 7
No ratings yet
CH 7
47 pages
Z-Score TK
No ratings yet
Z-Score TK
9 pages
University of Zimbabwe: Authorized Materials: Calculator
No ratings yet
University of Zimbabwe: Authorized Materials: Calculator
11 pages
Correlation and Linear Regression Using Excel: Correlations
No ratings yet
Correlation and Linear Regression Using Excel: Correlations
2 pages
SARIMA
No ratings yet
SARIMA
8 pages
The Australian Journal of Agricultural Economics
No ratings yet
The Australian Journal of Agricultural Economics
13 pages
Sampling Theory PYQ
No ratings yet
Sampling Theory PYQ
2 pages
Introduction To Statistical Methods
No ratings yet
Introduction To Statistical Methods
9 pages
Linear Regression
No ratings yet
Linear Regression
25 pages
A First Course in Parametric Inference by B K Kale 8173191956
No ratings yet
A First Course in Parametric Inference by B K Kale 8173191956
5 pages
MSE
No ratings yet
MSE
3 pages
Econometrics Syllabi
No ratings yet
Econometrics Syllabi
5 pages
CH 11 Quiz
No ratings yet
CH 11 Quiz
8 pages
Chapter 3 - Linear Regression Model
No ratings yet
Chapter 3 - Linear Regression Model
289 pages
4a. LPM-Logit-Probit-Tobit Model - IInd Sem 23-24
No ratings yet
4a. LPM-Logit-Probit-Tobit Model - IInd Sem 23-24
130 pages
SSC CGL 2024 New Syllabus
No ratings yet
SSC CGL 2024 New Syllabus
10 pages
Varma Garch
No ratings yet
Varma Garch
55 pages
Choosing The Correct Statistical Test (CHS 627 - University of Alabama)
No ratings yet
Choosing The Correct Statistical Test (CHS 627 - University of Alabama)
3 pages
Regression
No ratings yet
Regression
21 pages
3 Simple Linear Regression
No ratings yet
3 Simple Linear Regression
71 pages
Bdba Notes
No ratings yet
Bdba Notes
5 pages
Correlation & Regression Numericals
No ratings yet
Correlation & Regression Numericals
4 pages