lecture 6 linear regression

The document explains linear regression, detailing the relationship between dependent and independent variables, and the use of regression coefficients to predict values. It outlines the assumptions necessary for valid regression analysis and discusses the significance testing of regression slopes using hypothesis testing and ANOVA. Additionally, it emphasizes the importance of caution when extrapolating predictions outside the observed data range.

Uploaded by

antetokoumpogiannis904

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

lecture 6 linear regression

Uploaded by

antetokoumpogiannis904

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

LINEAR REGRESSION

The relationship between two variables may be one of functional dependence of one on the other.
That is, the magnitude of one of the variables (the dependent variable) is assumed to be
determined by-that is, is a function of-the magnitude of the second variable (the independent
variable), whereas the reverse is not true. For example, in the relationship between blood
pressure and age in humans, blood pressure may be considered the dependent variable and age
the independent variable; we may reasonably assume that although the magnitude of a person's
blood pressure might be a function of age, age is not determined by blood pressure. This is not to
say that age is the only biological determinant of blood pressure, but we do consider it to be one
determining factor. The term dependent does not necessarily imply a cause-and-effect
relationship between the two variables. Such a dependence relationship is called a regression.
The term simple regression refers to the simplest kind of regression, one in which only two
variables are considered. Multiple regression involves more than two variables.

Data amenable to simple regression analysis consist of pairs of data measured on a ratio or
interval scale. These data are composed of measurements of a dependent variable (Y) that is a
random effect and an independent variable (X) that is either a fixed effect or a random effect.

It is convenient and informative to graph simple regression data using the ordinate (Y axis) for
the dependent variable and the abscissa (X axis) for the independent variable. Such a graph is
shown in the figure below for the n = 13, where the data appear as a scatter of 13 points, each
point representing a pair of X and Y values. One pair of X and Y data may be designated as (X1,
Y1), another as (X2, Y2), another as (X3, Y3), and so on, resulting in what is called a scatter plot
of all n of the (Xi, Yi) data.

DATA: Wing length of sparrow of various ages

The relationship between two variables can be represented in the form of the general
equation of straight line;

Y= α +βX
Where α and β are population parameters hence constants: estimated by sample statistics
‘a’ and ‘b’ Hence y=a+bx

The only way to determine the population parameters α and β would be to possess all the data for
the entire population. Since this is nearly always impossible, we have to estimate these
parameters from a sample of n data, where n is the number of pairs of X and Y values. The
calculations required to arrive at such estimates, as well as to execute the testing of a variety of
important hypotheses, involve the computation of sums of squared deviations from the mean.

Recall that the "sum of squares" of Xi values is defined as å(Xi - X)2, which is more easily
obtained on a calculator as
Here it will be convenient to define xi = Xi - , so that this sum of squares can be abbreviated as
å xi2 or, more simply, as å x2

Another quantity needed for regression analysis is referred to as the sum of the cross products of
deviations from the mean:
Where y denotes a deviation of a Y value from the mean of all Y's just as x denotes a deviation of
an X value from the mean of all X's. The sum of the cross products, analogously to the sum of
squares, has a simple-to-use "machine formula":

(this is the recommended formula for ease of

calculations)

The Regression Coefficient: The parameter β is termed the regression coefficient, or the slope
of the best-fit regression line. The best sample estimate of β is ‘a’

Although the denominator in this calculation is always positive, the numerator may be positive,
negative, or zero, and the value of b theoretically can range from -∞ to + ∞, including zero
(b) The Y Intercept: An infinite number of lines possess any stated slope, all of them parallel.
However, each such line can be defined uniquely by stating α, in addition to β at anyone point on
the line-that is, any pair of coordinates,
(Xi, Yi). The point conventionally chosen is the point on the line where X = 0.The value of Y in
the population at this point is the parameter α, which is called the Y intercept. It can be shown
mathematically that the point ( , ) always lies on the best-fit regression line. Thus, substituting
and in the equation of a straight line, we find that;

For any given slope, there exist an infinite number of possible regression lines, each with a
different Y intercept. Similarly, for any given Y intercept, there exist an infinite number of
possible regression lines, each with a different slope. Three of the infinite number are shown
here.
Predicting Values of Y. Knowing the parameter estimates a and b for the linear regression
equation, we can predict the value of the dependent variable expected in the population at a
stated value of Xi. For example, the wing length of a sparrow at 13.0 days of age would be
predicted using the regression equation as follows;

A word of caution is in order concerning predicting Yi values from a regression equation.

Generally, it is an unsafe procedure to extrapolate from a regression equation-that is, to predict
Yi values for Xi values outside the observed range of Xi. It would, for example, be unjustifiable
to attempt to predict the wing length of a 20-day-old sparrow, or a l-day-old sparrow, using the
regression calculated for birds ranging from 3.0 to 17.0 days in age. Indeed, applying the
equation to a one-year-old sparrow would predict a wing nearly one meter long! What the linear
regression describes is Y as a function of X within the range of observed values of X. Thus, a
regression equation is often used to interpolate; that is, to estimate a value of Y for an X lying
between X's in the sample. But for values of X above or below this range, the function may not
be the same (i.e., a and/or β may be different); indeed, the relationship may not even be linear in
such ranges, even though it is linear within the observed range. If there is good reason to believe
that the described function holds for X values outside the range of those observed, then we may
cautiously extrapolate. Otherwise, beware.

Assumptions of Regression Analysis

Certain basic assumptions must be met to validly test hypotheses about regressions or to set
confidence intervals for regression parameters, although these assumptions are not necessary to
compute the regression coefficient, a. the Y intercept, a, and the coefficient of determination, r2:

1. For each value of X, the values of Yare to have come at random from the sampled
population and are to be independent of one another. That is, obtaining a particular Y
from the population is in no way dependent upon the obtaining of any other Y.
2. For any value of X in the population there exists a normal distribution of Yvalues
3. There is homogeneity of variances in the population; that is, the variances of the
distributions of Y values must all be equal to each other.
4. In the population, the mean of the Y's at a given X lies on a straight line with the mean of
all other Y's at all other X's. That is, the actual relationship between Yand X is linear.
5. The measurements of X were obtained without error. This, of course, is typically
impossible; so what we do in practice is assume that the errors in measuring X are
negligible, or at least small, compared with errors in measuring Y.
Violations of assumptions 2, 3, or 4 can sometimes be countered by transformation of data. Data
in violation of assumption 3 will underestimate the residual mean square and result in an
inflation of the test statistic (F or t), thus increasing the probability of a Type I error.
Heteroscedastic data may sometimes be analyzed advantageously by a procedure known as
weighted regression, which will not be discussed here.

Regression statistics are known to be robust with respect to at least some of these underlying
assumptions. So violations of them are not usually of concern unless they are severe.

TESTING THE SIGNIFICANCE OF A REGRESSION

The slope, b, of the regression line computed from the sample data expresses quantitatively the
straight-line dependence of Y on X in the sample. But what is really desired is information about
the functional relationship (if any) in the population from which the sample came. Indeed, the
finding of a dependence of Y on X in the sample (i.e., b ≠0) does not necessarily mean that there
is a dependence in the population (i.e., (β≠ 0). Consider the following Figure, a scatter plot
representing a population of data points with no dependence of Y on X; the best-fit regression
line for this population would be parallel to the X axis (i.e., the slope, (β, would be zero).
However, it is possible, by random sampling, to obtain a sample of data points having the five
values circled in the figure. By calculating b for this sample of five, we would estimate that β
was positive, even though it is, in fact, zero.
A hypothetical population of data points, having a regression coefficient, p, of zero. The
circled points are a possible sample of five.

We are not likely to obtain five such points out of this population, but we desire to assess just
how likely it is; therefore, we can set up a null hypothesis, Ho: β= 0, and the alternate
hypothesis, HA: β ≠0, appropriate to that assessment. If we conclude that there is a reasonable
probability (i.e., a probability greater than the chosen level of significance-say, 5%) that the
calculated b could have come from sampling a population with a β= 0, the Ho is not rejected. If
the probability of obtaining the calculated b is small (say, 5% or less), then Ho is rejected, and
HA is assumed to be true.

The Ho, may be tested by an analysis of variance (ANOVA) procedure. First, the overall
variability of the dependent variable is calculated by computing the sum of squares of deviations
of Yi values from
, a quantity termed the total sum of squares:

Then we determine the amount of variability among the Yi values that is attributable to there
being a linear regression; this is termed the linear regression sum of squares

regression SS can also be calculated as

The degrees of freedom associated with the variability among Y/s due to regression are always 1
in a simple linear regression. The residual degrees of freedom are calculable as residual DF =
total DF – regression DF = n - 2. Once the regression and residual mean squares are calculated
(MS = SS/DF, as usual), Ho may be tested by determining

Summary of calculations

Chapter 6 Student
No ratings yet
Chapter 6 Student
21 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
3 STAT-602 Regression & Correlation
No ratings yet
3 STAT-602 Regression & Correlation
4 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
Business Stat 10 12 .PDF
No ratings yet
Business Stat 10 12 .PDF
144 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
20 pages
Correlation and Regression Analyses
No ratings yet
Correlation and Regression Analyses
8 pages
correlation
No ratings yet
correlation
13 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
1 - Stat-701 Regression
No ratings yet
1 - Stat-701 Regression
18 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
Ra Web
No ratings yet
Ra Web
70 pages
REGRESSION ANALYSIS STA 221
No ratings yet
REGRESSION ANALYSIS STA 221
10 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
91 pages
Topic 5-Lecture Notes
No ratings yet
Topic 5-Lecture Notes
12 pages
Regression Analysis (Simple)
100% (1)
Regression Analysis (Simple)
8 pages
DISCRETE MATH Chapter-8
No ratings yet
DISCRETE MATH Chapter-8
34 pages
BS Ref17
No ratings yet
BS Ref17
32 pages
How Can We Explore The Association Between Two Quantitative Variables?
No ratings yet
How Can We Explore The Association Between Two Quantitative Variables?
7 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Chapter 0
No ratings yet
Chapter 0
10 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
CHAPTER 14 Regression Analysis
No ratings yet
CHAPTER 14 Regression Analysis
69 pages
Regression
No ratings yet
Regression
25 pages
Looking at Data: Relationships: Least-Squares Regression
No ratings yet
Looking at Data: Relationships: Least-Squares Regression
23 pages
Presentation REGRESSION (9)
No ratings yet
Presentation REGRESSION (9)
26 pages
Chapter 10
No ratings yet
Chapter 10
3 pages
SQQS2073 Note 1 Simple Linear Regression
No ratings yet
SQQS2073 Note 1 Simple Linear Regression
11 pages
Regression: Simple Linear Regression Model
No ratings yet
Regression: Simple Linear Regression Model
16 pages
Handout 5 Correlation and Regression (Recovered)
No ratings yet
Handout 5 Correlation and Regression (Recovered)
6 pages
Linear Regression Chap01
100% (1)
Linear Regression Chap01
7 pages
Asynchronus Learning Module - Sesi 8
No ratings yet
Asynchronus Learning Module - Sesi 8
9 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
Topic:-Regression: Name: - Teotia Nidhi Class: - M.SC Biotechnology
No ratings yet
Topic:-Regression: Name: - Teotia Nidhi Class: - M.SC Biotechnology
10 pages
Student Notes Madule 2
No ratings yet
Student Notes Madule 2
12 pages
Econometrics 2
No ratings yet
Econometrics 2
27 pages
Regression: by Vijeta Gupta Amity University
No ratings yet
Regression: by Vijeta Gupta Amity University
15 pages
A Tutorial On How To Run A Simple Linear Regression in Excel
No ratings yet
A Tutorial On How To Run A Simple Linear Regression in Excel
19 pages
Regression
No ratings yet
Regression
24 pages
Regression & Correlation
No ratings yet
Regression & Correlation
18 pages
9 Regression Analysis
No ratings yet
9 Regression Analysis
38 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Week 12+13
No ratings yet
Week 12+13
47 pages
Cha 6
No ratings yet
Cha 6
8 pages
ASS#1-FINALS Doromal
No ratings yet
ASS#1-FINALS Doromal
8 pages
Unit 07 Regression Correlation (1)
No ratings yet
Unit 07 Regression Correlation (1)
36 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
16 pages
Regression and Correlation
No ratings yet
Regression and Correlation
37 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
6.1 Basics-of-Statistical-Modeling
No ratings yet
6.1 Basics-of-Statistical-Modeling
17 pages
Regression Course For Second Year (Chap 1-3)
No ratings yet
Regression Course For Second Year (Chap 1-3)
59 pages
Chapter 12 Notes
No ratings yet
Chapter 12 Notes
60 pages
Regression Analysis - SSB
No ratings yet
Regression Analysis - SSB
2 pages
6. Simple and Multiple Regression
No ratings yet
6. Simple and Multiple Regression
56 pages
Lesson 6 02 Regression 2
No ratings yet
Lesson 6 02 Regression 2
17 pages
Correlation and Regression 2
No ratings yet
Correlation and Regression 2
24 pages
Regression and Correlation
100% (1)
Regression and Correlation
9 pages
Statistical Analysis: Linear Regression
No ratings yet
Statistical Analysis: Linear Regression
36 pages
Regression Intro
No ratings yet
Regression Intro
3 pages
Cópia de Optec 6500 Manual
No ratings yet
Cópia de Optec 6500 Manual
22 pages
Ce 1
No ratings yet
Ce 1
1 page
Law of Insolvency: Bankruptcy
No ratings yet
Law of Insolvency: Bankruptcy
6 pages
Cell Injury Seqs-1
No ratings yet
Cell Injury Seqs-1
4 pages
Automapper Documentation: Jimmy Bogard
No ratings yet
Automapper Documentation: Jimmy Bogard
103 pages
SDLC
0% (1)
SDLC
4 pages
Corporate and Business Law (LW-ENG) : Syllabus and Study Guide
No ratings yet
Corporate and Business Law (LW-ENG) : Syllabus and Study Guide
15 pages
Module-2 Notes Biology For Engineers....
No ratings yet
Module-2 Notes Biology For Engineers....
14 pages
SMA 3261 - Lecture 4 - Numerical - Differentiation
No ratings yet
SMA 3261 - Lecture 4 - Numerical - Differentiation
10 pages
Full Olympic Cities City Agendas Planning and The World S Games 1896 2016 2nd Edition John R. Gold Ebook All Chapters
100% (3)
Full Olympic Cities City Agendas Planning and The World S Games 1896 2016 2nd Edition John R. Gold Ebook All Chapters
84 pages
KDS 14 20 64 Design Standard For Structural Plain Concrete
No ratings yet
KDS 14 20 64 Design Standard For Structural Plain Concrete
11 pages
Multicast Part of The CCIE EI Workbook Orhan Ergun
No ratings yet
Multicast Part of The CCIE EI Workbook Orhan Ergun
32 pages
Revised Forestry Code
No ratings yet
Revised Forestry Code
19 pages
Astm A 564-A 564M-2004 R2009
No ratings yet
Astm A 564-A 564M-2004 R2009
7 pages
Tétel
No ratings yet
Tétel
9 pages
Haslam Homeric Papyri A New Companion
No ratings yet
Haslam Homeric Papyri A New Companion
46 pages
Undersecretary: in View of The Preparations For The Upcoming School Year (SY) 2022-2023 School Opening
No ratings yet
Undersecretary: in View of The Preparations For The Upcoming School Year (SY) 2022-2023 School Opening
2 pages
Monthly 100 MCQS Online Test PDF
No ratings yet
Monthly 100 MCQS Online Test PDF
30 pages
Pessing Libya INC: SA 312 TP 316 TO SA 312 TP
No ratings yet
Pessing Libya INC: SA 312 TP 316 TO SA 312 TP
1 page
Brian Tracy 18 Pasos para Programar La Mente para El Exito
No ratings yet
Brian Tracy 18 Pasos para Programar La Mente para El Exito
19 pages
CFBC Datasheets
100% (1)
CFBC Datasheets
5 pages
Control CL Commands With Command Exit Programs - Part 1
No ratings yet
Control CL Commands With Command Exit Programs - Part 1
8 pages
Built-In Power Supply Photoelectric Sensor
No ratings yet
Built-In Power Supply Photoelectric Sensor
16 pages
01- 24 MLD STP Bijnor Power Consume Report January-2025
No ratings yet
01- 24 MLD STP Bijnor Power Consume Report January-2025
1 page
Myocarditis After Covid Vaccine PDF
No ratings yet
Myocarditis After Covid Vaccine PDF
8 pages
InFocus IN3120 Series Datasheet en 7
No ratings yet
InFocus IN3120 Series Datasheet en 7
4 pages
Tobii User Story - Sebastian Jansson
No ratings yet
Tobii User Story - Sebastian Jansson
2 pages
Sample Clil Lesson Plan - Primary School Art
No ratings yet
Sample Clil Lesson Plan - Primary School Art
4 pages
Map Server For Visualizing Air Traffic Based On Data From A Remote Pseudo Radar
No ratings yet
Map Server For Visualizing Air Traffic Based On Data From A Remote Pseudo Radar
14 pages
BIM TO FIM Stanford Health Care
100% (2)
BIM TO FIM Stanford Health Care
41 pages