100% found this document useful (1 vote)

69 views

Correlation Regression

This document discusses correlation and regression analysis. It defines univariate, bivariate, and multivariate analysis. Correlation refers to the relationship between two variables. There can be positive, negative, linear, or non-linear correlation. The correlation coefficient measures the strength and direction of linear correlation between two variables. Scatter plots are used to visualize the relationship between variables. The rank correlation coefficient measures correlation using ranks rather than raw data values.

Uploaded by

Varshney Nitin

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

69 views

Correlation Regression

Uploaded by

Varshney Nitin

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

CORRELATION & REGRESSION

Prepared By
Nitin Varshney
Assistant Professor
Agricultural Statistics
CoA, NAU, Waghai.
 The study related to the characteristics of only one variable
such as height, weight, age, marks, wages etc. is known as
univariate analysis.
 The study related to the relationship between two variables
such as height & weight. is known as Bivariate analysis.

CORRELATION
 When we study two or more variables simultaneously, we
observe that movements in one variable are accompanied by
movements in other variable.
 Example:
 Husband’s age and wife’s age move together
 Scores on IQ test move with scores in university
examinations
 Relation b/w income & expenditure on household.
 Relation b/w price & demand of commodity.
Meaning of Correlation

 In bivariate distribution (study of two variables), we are

interested to find out if there is any correlation or
covariation b/w the two variables.
 If the change in one variable affects a change in the other
variable, the variables are said to be correlated.

Types of Correlation

 Positive and negative

 Linear and Non-linear
 Multiple and Partial
Positive and negative correlation
 If the two variables deviate in the same direction i.e. if the
increase (or decrease) in one results in a corresponding
increase (or decrease) in other, correlation is said to be direct
or positive.
Example: Correlation between
 Heights & weights of a group of persons
 Income & expenditure

 If the two variables deviate in the opposite direction i.e. if the

increase (or decrease) in one results in a corresponding
decrease (or increase) in other, correlation is said to be
diverse or negative.
Example: Correlation between
 Price & demand of a commodity
 Volume & pressure of a perfect gas
Linear and non-linear correlation

 If the ratio of change b/w the two variables is constant then

there will be linear correlation b/w them. consider the
following example:

X 2 4 6 8 10 12 14 16
Y 3 6 9 12 15 18 21 24

 Here the ratio of change

b/w the two variables is same.
 If we plot these point on a

graph we will get a straight

line.
 If the amount of change in the one variable does not show a
constant change in the other variable. Then there will be
curvilinear or nonlinear correlation b/w them.

X Y
2 2
4 6
6 8
8 12
10 18
12 24
14 36
16 44
18 54
20 67
22 75
24 89
Multiple and Partial Correlation
 When there are interrelationship between many variables
and the value of one variable is influenced by many other
variables, e.g. The yield of crop per acre (X1) may depends
upon quality of seed (X2), fertility of soil (X3), fertilizer
used (X4), irrigation facilities (X5), weather conditions (X6)
and so on.
 Whenever we are interested in studying the joint effect of
a group of variables upon a variable, then the correlation
is known as multiple correlation.

 The correlation b/w only two variables X1 and X2, while

eliminating the linear effect of other variables is known as
partial correlation.
SCATTER DIAGRAM
 It is a simplest way of the diagrammatic representation of a
bivariate data.
 For a bivariate distribution (xi, yi); i=1, 2, …, n, if the values
of the variables X and Y are plotted along the x-axis and y-
axis respectively in the x-y plane, the diagram of dots so
obtained is known as scatter diagram.
 From the scatter diagram, we can form a fairly good idea
whether the variables are correlated or not. e.g.
 If the points are very dense (very close to each other):
There is good correlation between variables
 If the points are widely scattered: There is poor correlation
between variables.

Karl Pearson’s coefficient of correlation

 Karl Pearson developed a formula called correlation
coefficient as a measure of intensity or degree of linear
relationship between two variables.
 Correlation coefficient between two random variables X
and Y, usually denoted by r(X, Y) or rXY, is a numerical
measure of linear relationship between two variables. It is
defined as:
Cov( X , Y )
r( X , Y ) 
 XY

 It provide a measure of linear relationship between X and

Y.
 If (xi, yi); i=1, 2, …, n is the bivariate distribution, then
Coviance  Cov( X , Y )   X  Y  E[{X  E ( X )}  {Y  E (Y )}]


1
 ( xi  x )( yi  y )
n

(x  x)
1
Variance   X2  E{ X  E ( X )}2  i
2
n
  ( y  y)
1
Variance   Y2  E{Y  E (Y )}2 i
2
n

Variance   X2  E{ X  E ( X )}2 Cov( X , Y )  E[{ X  E ( X )}  {Y  E (Y )}]


1

 ( xi  x ) 2 1
n  ( xi  x )( yi  y )
n

1
 ( xi2  x 2  2 xi x )

1
n  ( xi yi  xi y  xyi  xy )
n
 
1 1
 xi2  x 2  2 x
  
xi 1
n n  xi yi y xi  x yi  x y
n

1
 X2  xi2  x 2

1
n Cov( X , Y )  xi yi xy
n
 y
1 1
Variance   Y2  E{Y  E (Y )}2  ( yi  y ) 2  2
i  y2
n n
PROPERTIES OF CORRELATION COEFFICIENT
 Range is -1 to +1.
 is independent of change of origin and scale.
 Two independent variables are uncorrelated.

Interpretation of correlation coefficient

 when r=1, there is perfect positive correlation b/w variables.
 when r=-1, there is perfect negative correlation b/w variables.
 when r=0, there is no relation b/w variables.
 when the value of r lies b/w +1 to -1, it signifies that there is
correlation b/w variables.
 when the value of r is close to +1 or -1 then it signifies high
positive or negative correlation b/w variables.
 when the value of r is close to 0 then it signifies very less
correlation b/w variables.
RANK CORRELATION
 This method is useful to study the qualitative measure of
attributes like honesty, intelligence, color, beauty, morality
etc.
 This method is based on ranks of the character under study.
 Group of individuals is arranged in an order of merit or
proficiency of any two characters A and B.
 Example: If we want to find the relation between
intelligence and beauty.
A: Intelligence B: Beauty

Ranks xi yi i=1, 2, 3, …, n

 Pearsonian coefficient of correlation between ranks xi’s and

yi’s is called rank correlation coefficient between A and B for
that group of individual.
SPEARMAN'S RANK CORRELATION COEFFICIENT
 This method is developed by Edward Spearman.
 Spearman’s formula for the rank correlation coefficient is
given by n n

d
i 1
i
2
6d i 1
i
2

  1  1
2n X2 n(n  1)
2

 di is the difference between ranks di=xi-yi.

 Range of rank correlation coefficient is -1 to +1.

Q. 2. In a marketing survey the price of tea and coffee in a town

based on quality. Data is given as follows, find the relation b/w
price of tea and coffee.

Price of Tea 88 90 95 70 60 75 50
Price of Coffee 120 134 150 115 110 140 100
REGRESSION
 The term “regression” literally means “stepping towards
the average”.
 It is given by Sir Francis Galton.

 Galton found that the offspring of abnormally tall or short

parents tend to regress or step back to the average
population height.
 Regression analysis is a mathematical measures of the
average relationship between two or more variables in
terms of the original units of the data.
 In regression analysis there are two types of variables.

Dependent Variable Independent Variable

OR OR
Regressed Regressor Explanatory
Explained
Predictor variable
variable

The variable which

The variable whose
influences the values
value is influenced or
or is used for
is to be predicted.
prediction.
LINEAR REGRESSION
 If the variables in a bivariate distribution are related
(means variables are correlated), we will find that the
points in the scatter diagram will cluster round some
curve called the “curve of regression”.
 If the curve is straight line, it is called the “line of
regression”.
 Then there is said to be linear regression between the
variables, otherwise curvilinear regression.

Linear Regression Equation

Let us suppose that in the bivariate distribution (xi, yi);
i=1, 2, 3, …, n; Y is dependent variable and X is
independent variable. Let the line of regression of Y on X
be
Y=a+bX (a, b are constants)
 There are two regression lines
 If Y is dependent variable and X is independent variable, then
it is called the line of regression Y on X.
Y= a+ byx X
 If X is dependent variable and Y is independent variable, then
it is called the line of regression X on Y.
Y= a+ bxy X
 where byx is regression coefficient (slope) of the regression line
Y on X.
 where bxy is regression coefficient (slope) of the regression line
X on Y.
 The line of regression is the line which gives the best estimate
to the value of one variable for any specific value of the other
variable.
 Thus the line of regression is the line of best fit.
 It is obtained by the principle of least squares.
PRINCIPLE OF LEAST SQUARES
 Let the line of regression of Y on X be
Y= a+ byx X
 ei=yi-(a+byxxi) is called the error of estimate or residual for
y i.
 According to the principle of least squares, we have to
determine a and b so that
n n
E 
i 1
ei2  ( y  a  b
i 1
i yx xi )
2

is minimum.
 By solving the partial derivatives we will get two normal
equations for estimating a and b.
n n

y
i 1
i  na  byx xi 1
i (i)

n n n


i 1
xi yi  a 
i 1
xi  byx 
i 1
xi2 (ii)
 If we divide the eqn. (i) by n then we get
y  a  byx x
 Thus the line of regression of Y on X passes through the point
(x , y).
 So regression coefficient (slope) of the line of regression of Y
on X is given by
Cov( x, y)
byx 
V ( x)

 xy 
(  x)( y)
byx  n
and a  y  byx x
 x  n
2
( x) 2

 So regression coefficient (slope) of the line of regression of X

on Y is given by
Cov( x, y)
bxy 
V ( y)


(  x)( y)
bxy 
xy 
n and a  y  bxy x
y 2

(  y) 2

n
 Since byx is the slope of the regression of Y on X and since the
line of regression passes through the point ( x, y ), its equation
is
Cov( X , Y ) 
Y  y  byx ( X  x )  ( X  x)  r Y ( X  x)
V (X ) X

 Similarly for the line X on Y

Cov( X , Y ) 
X  x  bxy (Y  y )  (Y  y )  r X (Y  y )
V (Y ) Y

Cov( X , Y )
r  Cov( X , Y )  r X  Y
V ( X )V (Y )
Cov( X , Y )
bYX   Cov( X , Y )  bYX  X2
V (X )
 r X  Y  bYX  X2
Y X
 bYX  r similarlyb XY  r
X Y
PROPERTIES OF REGRESSION COEFFICIENT
1. Fundamental Property: Correlation coefficient is the geometric
mean between the regression coefficients.
 
b XY  bYX  r Y  r X  r 2
X Y
 r   b XY  bYX

2. Signature Property: Sign of correlation coefficient is the same as

that of regression coefficients. Thus if the regression coefficients
are positive then correlation coefficient will be positive and vice-
versa.
3. Magnitude Property: If one of the regression coefficients is
greater than unity, the other must be less than unity.
If bYX  1 then bXY  1
4. Mean Property: The modulus value of the arithmetic mean of the
regression coefficients is not less than the modulus value of
correlation coefficient r.
1
(b XY  bYX )  r
2
5. Regression coefficients are independent of the change of
origin but not of scale.
6. Angle between two lines of regression: If θ is the acute
angle between the two lines of regression, then
 2 
  
1 1  r
  tan   X Y 
 r   X   Y 
 2 2

 If r=0, tan θ =∞→ θ=90°. Thus if the two variables are uncorrelated,
the lines of regression become perpendicular to each other.
 If r=±1, tan θ =0→ θ=0° or 180°. Thus if the two variables are
perfectly correlated, the lines of regression coincide to each other.

Q.3. From a paddy field, 15 plants were selected randomly. The length of
panicle (cm) and number of grains per panicle were recorded. Fit the
regression line for the given dataset and compute the number of estimated
grains per panicle if the panicle length is 25.2 cm.

Length of Panicle (cm) 22.4 23.3 24.1 24.3 23.5 23.1 21 20.6 26.4 25.4 23.4 21.4 23.6 24.5 22.5

No. of grains per

95 109 133 132 136 116 94 85 143 138 129 88 127 142 110
panicle

YHT Realty Corp V CA
100% (8)
YHT Realty Corp V CA
2 pages
Viola Repertoire and Etude Lists
91% (11)
Viola Repertoire and Etude Lists
22 pages
Correlation & Simple Regression
No ratings yet
Correlation & Simple Regression
15 pages
J. K.Shah Classes Regression Analysis
No ratings yet
J. K.Shah Classes Regression Analysis
15 pages
Class 11 Economics 2024-25 Notes Chapter 6. Correlation
No ratings yet
Class 11 Economics 2024-25 Notes Chapter 6. Correlation
39 pages
International Financial Reporting Standards - Fina
No ratings yet
International Financial Reporting Standards - Fina
13 pages
Study of Averages Final
No ratings yet
Study of Averages Final
111 pages
BBA - 106 - Lecture Notes On Regression Analysis
67% (3)
BBA - 106 - Lecture Notes On Regression Analysis
2 pages
(6426) Revision Worksheet For Cycle Test - Measures of Dispersion Economics - Grade 11F Final
No ratings yet
(6426) Revision Worksheet For Cycle Test - Measures of Dispersion Economics - Grade 11F Final
5 pages
Chapter Five Regression
No ratings yet
Chapter Five Regression
12 pages
Ignou Assignment
No ratings yet
Ignou Assignment
8 pages
1.introduction of Statistics
No ratings yet
1.introduction of Statistics
22 pages
Index Numbers
100% (1)
Index Numbers
22 pages
Measures of Dispersion - Notes
No ratings yet
Measures of Dispersion - Notes
5 pages
Econometric S
No ratings yet
Econometric S
26 pages
Micro CH-5 Production Function123
No ratings yet
Micro CH-5 Production Function123
16 pages
04 Moments, Skewness & Kurtosis
100% (1)
04 Moments, Skewness & Kurtosis
6 pages
Index Number
No ratings yet
Index Number
20 pages
Sampling Distribution Revised For IBS 2020 Batch
No ratings yet
Sampling Distribution Revised For IBS 2020 Batch
48 pages
LPG
No ratings yet
LPG
10 pages
Chapter 1
No ratings yet
Chapter 1
9 pages
Index Numbers
50% (2)
Index Numbers
45 pages
CH 3 Organisation of Data Notes
No ratings yet
CH 3 Organisation of Data Notes
7 pages
Consumer Price Index: Significance of CPI
No ratings yet
Consumer Price Index: Significance of CPI
10 pages
Sample Paper-2013 Economics Class-XI: MAX - MARKS:100 Time: 3hours Ge Neral Instructions
No ratings yet
Sample Paper-2013 Economics Class-XI: MAX - MARKS:100 Time: 3hours Ge Neral Instructions
5 pages
Chapter 7
No ratings yet
Chapter 7
43 pages
Index Number
No ratings yet
Index Number
33 pages
Reformulating Financial Statements
100% (1)
Reformulating Financial Statements
17 pages
Correlation and Covariance
No ratings yet
Correlation and Covariance
11 pages
Correlation-Regression 2019
No ratings yet
Correlation-Regression 2019
76 pages
DWM Sem V Module 2 - Introduction To Data Mining, Data Exploration and Data Pre-Processing
No ratings yet
DWM Sem V Module 2 - Introduction To Data Mining, Data Exploration and Data Pre-Processing
55 pages
Assessment ManageEconomics
No ratings yet
Assessment ManageEconomics
21 pages
2.correlation Regression Summary Notes by Pranav Popat 1
No ratings yet
2.correlation Regression Summary Notes by Pranav Popat 1
4 pages
Multiple Regression and Correlation Analysis: BX A Y
No ratings yet
Multiple Regression and Correlation Analysis: BX A Y
35 pages
Large Sample Test
100% (1)
Large Sample Test
7 pages
QM UNIT 4 Index Numbers
100% (1)
QM UNIT 4 Index Numbers
34 pages
Collection of Data Class 11
No ratings yet
Collection of Data Class 11
13 pages
Measures of Disperson
No ratings yet
Measures of Disperson
17 pages
Question Bank Class 11 Eco II CH 8 Use of Statistical Tools
No ratings yet
Question Bank Class 11 Eco II CH 8 Use of Statistical Tools
5 pages
Statistics For Economics Class 11 Notes Chapter 1 Introduction
No ratings yet
Statistics For Economics Class 11 Notes Chapter 1 Introduction
36 pages
Multicollinearity Among The Regressors Included in The Regression Model
No ratings yet
Multicollinearity Among The Regressors Included in The Regression Model
13 pages
Private, Public and Global Enterprises
No ratings yet
Private, Public and Global Enterprises
16 pages
Statistics PPT UNIT I 28.11.2020
No ratings yet
Statistics PPT UNIT I 28.11.2020
150 pages
Chapter Three Multiple
No ratings yet
Chapter Three Multiple
15 pages
Unit:1 Introduction To Accounting: Unit at A Glance
No ratings yet
Unit:1 Introduction To Accounting: Unit at A Glance
3 pages
Statistics For Business I
No ratings yet
Statistics For Business I
63 pages
Correlation and Its Applications in Economics
No ratings yet
Correlation and Its Applications in Economics
22 pages
Statistics FULL Assignment
No ratings yet
Statistics FULL Assignment
86 pages
Regression 2024
No ratings yet
Regression 2024
49 pages
Autocorrelation: What Happens If The Error Terms Are Correlated?
No ratings yet
Autocorrelation: What Happens If The Error Terms Are Correlated?
18 pages
BA4101 - Statistics - For - Management - Revised
No ratings yet
BA4101 - Statistics - For - Management - Revised
21 pages
Consumer and Producer Surplus
No ratings yet
Consumer and Producer Surplus
3 pages
1 - Business Statistics
No ratings yet
1 - Business Statistics
82 pages
Correlation
No ratings yet
Correlation
19 pages
MCQ Eco 1
0% (1)
MCQ Eco 1
2 pages
Regression - and - Correlation 2 PDF
No ratings yet
Regression - and - Correlation 2 PDF
49 pages
OLS Assumptions
No ratings yet
OLS Assumptions
11 pages
Business Statistics II
No ratings yet
Business Statistics II
71 pages
Business Statistics Introduction
No ratings yet
Business Statistics Introduction
38 pages
Regression Correlation
No ratings yet
Regression Correlation
22 pages
Correlation & Regression
No ratings yet
Correlation & Regression
26 pages
Unit 3-1
No ratings yet
Unit 3-1
12 pages
Sumara Gaspar - Co-Teaching With Fourth Graders
No ratings yet
Sumara Gaspar - Co-Teaching With Fourth Graders
16 pages
Writing Cosmetic Surgery Review
No ratings yet
Writing Cosmetic Surgery Review
2 pages
Modal Exercises With Answers Diya
No ratings yet
Modal Exercises With Answers Diya
11 pages
ILSI SEAR Monograph Carbohydrates
No ratings yet
ILSI SEAR Monograph Carbohydrates
108 pages
Reasons To Quit Smoking
No ratings yet
Reasons To Quit Smoking
2 pages
Problem Set3
No ratings yet
Problem Set3
3 pages
Tkam Final Draft Francis Anisco
No ratings yet
Tkam Final Draft Francis Anisco
4 pages
Written Report
No ratings yet
Written Report
3 pages
The Squeezing Potential of Rocks Around Tunnels Theory and Prediction
No ratings yet
The Squeezing Potential of Rocks Around Tunnels Theory and Prediction
27 pages
Criminal-Law-Book-1-Articles-41-50 Final
No ratings yet
Criminal-Law-Book-1-Articles-41-50 Final
9 pages
Progress Internship Report
100% (1)
Progress Internship Report
2 pages
Checklist Lungs and Thorax 1
No ratings yet
Checklist Lungs and Thorax 1
2 pages
Creating Developmental Questions: Session 4a: Learner Guide
No ratings yet
Creating Developmental Questions: Session 4a: Learner Guide
9 pages
Lee Se Young
No ratings yet
Lee Se Young
4 pages
Scope and Sequence COST 2
No ratings yet
Scope and Sequence COST 2
5 pages
Autism English Assessment
No ratings yet
Autism English Assessment
9 pages
Statics - Chapter 4 Sem 2 1415
No ratings yet
Statics - Chapter 4 Sem 2 1415
49 pages
Ieee 802.11 Simulation-Libre
No ratings yet
Ieee 802.11 Simulation-Libre
29 pages
Rizal Veneration Without Understanding
100% (5)
Rizal Veneration Without Understanding
4 pages
List of Publications
No ratings yet
List of Publications
14 pages
Coal Sampling Slide Show Without Photos
No ratings yet
Coal Sampling Slide Show Without Photos
94 pages
B31 Willaware Products Corporation v. Jesichris Manufacturing Corporation, GR 195549, 3 September 2014, Third Division, Peralta (J) - EVANGELISTA PDF
No ratings yet
B31 Willaware Products Corporation v. Jesichris Manufacturing Corporation, GR 195549, 3 September 2014, Third Division, Peralta (J) - EVANGELISTA PDF
2 pages
Math Updated
No ratings yet
Math Updated
3 pages
Minnie's Sacrifice by Harper, Frances Ellen Watkins, 1825-1911
No ratings yet
Minnie's Sacrifice by Harper, Frances Ellen Watkins, 1825-1911
68 pages
Corporate Strategy Vertical Intergration Diversification
No ratings yet
Corporate Strategy Vertical Intergration Diversification
35 pages
Assignment IGNOU
No ratings yet
Assignment IGNOU
21 pages
Simple Past Tense Grammar Drills Grammar Guides - 1928
No ratings yet
Simple Past Tense Grammar Drills Grammar Guides - 1928
10 pages
Advanced Management Science: Quantitative Approaches To Decision Making
No ratings yet
Advanced Management Science: Quantitative Approaches To Decision Making
69 pages