Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
87 views

Linear Regression

Linear regression finds the best-fitting straight line to describe the relationship between two variables, X and Y. It determines the slope (b) and Y-intercept (c) that minimize the sum of squared residuals, or errors between the observed Y-values and those predicted by the straight line. Specifically, the least squares solution finds the b and c that make the predicted Y-values (Ŷ) as close as possible to the actual Y-values.

Uploaded by

Sher Ali Khan
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views

Linear Regression

Linear regression finds the best-fitting straight line to describe the relationship between two variables, X and Y. It determines the slope (b) and Y-intercept (c) that minimize the sum of squared residuals, or errors between the observed Y-values and those predicted by the straight line. Specifically, the least squares solution finds the b and c that make the predicted Y-values (Ŷ) as close as possible to the actual Y-values.

Uploaded by

Sher Ali Khan
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Linear Regression

James H. Steiger
Regression – The General Setup
You have a set of data on two variables, X
and Y, represented in a scatter plot.
You wish to find a simple, convenient
mathematical function that comes close to
most of the points, thereby describing
succinctly the relationship between X and Y.
Linear Regression
The straight line is a particularly simple
function.
When we fit a straight line to data, we are
performing linear regression analysis.
Linear Regression
The Goal
„ Find the “best fitting” straight line for a set of data.
Since every straight line fits the equation

Y = bX + c
with slope b and Y-intercept c, it follows that our task is
to find a b and c that produce the best fit.

„ There are many ways of operationalizing the notion of a


“best fitting straight line.” A very popular choice is the
“Least Squares Criterion.”
The Least Squares Criterion
For every data point X i ,Yi , define the
“predicted score” Yˆi to be the value of the
straight line evaluated at X i . The
“regression residual,” or “error of
prediction” is the distance from the straight
line to the data point in the up-down
direction.
The Least Squares Criterion

Ei
Yˆi
The Least Squares Criterion
The least squares criterion operationalizes
the notion of best fitting line as that choice
of b and c that minimizes the sum of
squared “residuals,” or errors. Solving this
problem is an elementary problem in
differential calculus. The method of
solution need not concern us here, but the
result is famous, important, and should be
memorized.
The Least Squares Solution

Sy
b = rYX
SX
c = Y• − bX •
The Standardized Least Squares
Solution

Suppose X and Y are both in Z score form.


What will the equations for the least squares
coefficients b and c reduce to? (C.P.)
Variance of Predicted and Error
Scores
When you compute a regression line, you can
“plug in” all the X scores into the equation for the
regression line, and obtain predicted and error
scores for each person. If you do this, it is well
known that the variances of the predicted and error
scores are
SY2ˆ = rYX2 SY2 S = (1 − r
2
E
2
YX )S
2
Y
Moreover, the predicted and error scores are
always precisely uncorrelated, that is,
SYˆ ,E = 0
Variance of Predicted and Error
Scores
Can you prove the results on the preceding
page, using linear transformation and linear
combination theory? (C.P.)

You might also like