Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
10 views19 pages

Correlation & Regression Analysis

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 19

VJCET - S4 -PSM - MODULE 5

Simple Linear Regression and Correlation


Regression Analysis

The objective of regression analysis is to exploit the relationship between


two (or more) variables so that we can gain information about one of them through
knowing values of the other(s).
Much of mathematics is devoted to studying variables that are
deterministically related. Saying that x and y are -related in this manner means
that once we are told the value of x, the value of y is completely specified.
There are many variables x and y that would appear to be related to one

VJCET
another, but not in a deterministic fashion. A familiar example is given by
variables x = high school grade point average (GPA) and y= college GPA. The
value of y cannot be determined just from knowledge of x, and two different
individuals could have the same x value but have very different y values.

Regression analysis is the part of statistics that investigates the relationship


between two or more variables related in a nondeterministic fashion. In this
chapter, we generalize the deterministic linear relation y = b0 + b1x to a linear
probabilistic relationship, develop procedures for making various inferences based
on the model, and obtain a quantitative measure (the correlation coefficient) of the
extent to which the two variables are related .

The Simple Linear Regression Model

The simplest deterministic mathematical relationship between two variables


x and y
is a linear relationshipy = b0 + b1x. The set of pairs (x, y) for which y = b0 + b1x
determines a straight line with slope b1 and y-intercept b0.* The objective of this

1
VJCET - S4 -PSM - MODULE 5

section is to develop a linear probabilistic model.

More generally, the variable whose value is fixed by the experimenter will
be denoted by x and will be called the independent, predictor, or explanatory
variable. For fixed x, the second variable will be random; we denote this random
variable and its observed value by Y and y, respectively, and refer to it as the
dependent or response variable.
Usually observations will be made for a number of settings of the independent
variable. Let x1 , x2 ,..., x n denote values of the independent variable for which
observations are made, and let Y i and y i, respectively, denote the random variable
and observed value associated with x i. The available bivariate data then

VJCET
consists of the n pairs (x1 , y1 ), (x2 , y2 ),..., (x n, yn). A picture of this data
called a scatterplot gives preliminary impressions about the nature of any
relationship. In such a plot, each (x i, yi) is represented as a point plotted on a
two-dimensional coordinate system.

Estimating Model Parameters

2
VJCET - S4 -PSM - MODULE 5

VJCET
normal equations:

3
VJCET - S4 -PSM - MODULE 5

Example 1 )The cetane number is a critical property in specifying the

ignition quality of a fuel used in a diesel engine. Determination of this


number for a biodiesel fuel is expensive and time-consuming. The article
“Relating the Cetane Number of Biodiesel Fuels to Their Fatty Acid
Composition: A Critical Study” (J. of Automobile Engr., 2009: 565–583)
included the following data on x = iodine value sgd and y = cetane number
for a sample of 14 biofuels. The iodine value is the amount of iodine
necessary to saturate a sample of 100 g of oil. The article’s authors fit the
simple linear regression model to this data, so let’s follow their lead. Fit the
simple linear regression model to this data,

VJCET
Since

Mean value of x= = 93.392857 and mean value of y= = 55.657143, the


estimated intercept of the true regression line (i.e., the intercept of the least
squares line) is

4
VJCET - S4 -PSM - MODULE 5

The equation of the estimated regression line (least squares line) is y

=75.212 - .2094x

The cetane number corresponding to iodine value 100 is obtained by


putting x=100 in fitted equation

Example2)
Fit the simple linear regression model to this data,

VJCET
5
VJCET - S4 -PSM - MODULE 5

Example 3

VJCET
6
VJCET - S4 -PSM - MODULE 5

Estimating

The parameter determines the amount of variability inherent in the

regression model. A large value of will lead to observed (x i, y i)’s that


are typically quite spread out about the true regression line, whereas when

is

small the observed points will tend to fall very close to the true line.

VJCET
RESIDUALS

Example 4) Japan’s high population density has resulted in a multitude of


resource-usage problems. One especially serious difficulty concerns waste
removal. An important part of the investigation involved relating the moisture
content of compressed pellets (y, in %) to the machine’s filtration rate (x, in
kg-DS/m/hr). The following data was read from a graph in the article:

7
VJCET - S4 -PSM - MODULE 5

VJCET
8
VJCET - S4 -PSM - MODULE 5

The Error sum of squares

VJCET
The Coefficient of Determination

or

9
VJCET - S4 -PSM - MODULE 5

VJCET
10
VJCET - S4 -PSM - MODULE 5

Correlation
The Sample correlation coefficient r

VJCET
Nature of correlation

11
VJCET - S4 -PSM - MODULE 5

VJCET
12
VJCET - S4 -PSM - MODULE 5

VJCET
Example

13
VJCET - S4 -PSM - MODULE 5

Residuals and Standardized Residuals

VJCET
14
VJCET - S4 -PSM - MODULE 5

VJCET
15
VJCET - S4 -PSM - MODULE 5

Example

VJCET
16
VJCET - S4 -PSM - MODULE 5

Polynomial regression model

VJCET
Estimating Parameters

normal equations

17
VJCET - S4 -PSM - MODULE 5

Example

VJCET
18
VJCET - S4 -PSM - MODULE 5

coefficient of multiple determination (R2)

VJCET
19

You might also like