Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
92 views

Unit 12 - Simple Correlation and Regression

Uploaded by

yogeshkadav
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views

Unit 12 - Simple Correlation and Regression

Uploaded by

yogeshkadav
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Statistics for Management Unit 12

Unit 12 Simple Correlation and Regression


Structure:
12.1 Introduction
Objectives
Relevance
12.2 Correlation
Causation and Correlation
Types of Correlation
12.3 Methods of Correlation
12.4 Measures of Correlation
Scatter diagram
Karl Pearson’s Correlation Coefficient
Properties of Karl Pearson’s Correlation Coefficient
Factors influencing the size of Correlation Coefficient
12.5 Probable Error
Conditions under which Probable Error is used
12.6 Spearman’s Rank Correlation Coefficient
12.7 Partial Correlation
12.8 Multiple Correlations
12.9 Regression
Regression analysis
Regression lines
Regression coefficient
12.10 Standard Error of Estimate
12.11 Multiple Regression Analysis
Applications of Multiple Regression
12.12 Summary
12.13 Glossary
12.14 Terminal Questions
12.15 Answers
12.16 Case Study

Manipal University Jaipur Page No. 438


Statistics for Management Unit 12

12.1 Introduction
In the previous unit, we dealt with analysis of variance (ANOVA),
assumptions for F-test, and classification of ANOVA. In this unit, we will
deal with correlation, methods of correlation, measures of correlation,
probable error, Spearman’s rank correlation coefficient, partial correlation,
multiple correlations, regression, standard error of estimate, multiple
regression analysis, and application of multiple regressions.
Both correlation and regression are used to measure the strength of
relationships between variables. Those statistical tools measure the
relationship between the variables analysed in social science research.
Objectives:
After studying this unit, you should be able to:
 define correlation and regression
 discuss the types and measures of correlation
 calculate the Karl Pearson’s correlation coefficient
 calculate the coefficient for partial and multiple correlation
 apply the method of estimating unknown values from known values
through regression equations
12.1.1 Relevance
The new CEO of a health care pharmaceutical company called for a
meeting of all heads of various departments to discuss the future strategy of
the company. While he expressed satisfaction over the growing sales of the
company, he also emphasised on the need of giving a further boost to the
sales and image of the company. The head of the R and D unit suggested
investing higher funds on innovation of new products and improvement of
existing ones. He pointed out that R and D had the most significant
contribution to the sales of the company. The head of the Marketing
department emphasised the importance of marketing strategy for boosting
the sales of the company. He, therefore, wanted more funds to be made
available for the purpose. The Head of HRD department suggested the
need for more staff and also new training programmes for improving the
sales significantly. The CEO agreed in person with them and was expecting
some analysis of quantitative facts and figures to evaluate the claims of the
head of department and commit funds for the new strategies. The job was
entrusted to a consultant who analysed the data using statistical techniques
Manipal University Jaipur Page No. 439
Statistics for Management Unit 12

in general and correlation and regression analysis in particular to assess the


impact of R and D, Marketing and HRD initiatives in boosting the sales of
the company and thus facilitated the CEO in taking appropriate decisions
based on an analytical approach.
(Source: Srivastava, T. N. and Rejo, S. (2008), Statistics for Management, 5th
edition, TMH)

12.2 Correlation
When two or more variables move in sympathy with the other, then they are
said to be correlated. If both variables move in the same direction, then they
are said to be positively correlated. If the variables move in the opposite
direction, then they are said to be negatively correlated. If they move
haphazardly, then there is no correlation between them. Correlation analysis
deals with the following:
 Measuring the relationship between variables.
 Testing the relationship for its significance.
 Giving confidence interval for population correlation measure.

Following are some of the definitions:


According to Croxton and Cowden, “When the relationship is of a
quantitative nature, the appropriate statistical tool for discovering and
measuring the relationship and expressing it in a brief formula is known as
correlation”.

According to A.M Tuttle, “Correlation is an analysis of the covariation


between two or more variables.”

According to W. A. Neiswanger, “Correlation analysis contributes to the


understanding of economic behaviour, aids in locating the critically important
variables on which others depend, may reveal to the economist the
connections by which disturbances spread, and suggest to him the paths
through which stabilising forces may become effective.”

According to Tippett, “The effect of correlation is to reduce the range of


uncertainty of our prediction.”

Manipal University Jaipur Page No. 440


Statistics for Management Unit 12

12.2.1 Causation and correlation


The correlation between two variables may be due to the following causes:
 Due to small sample sizes,
Correlation may be present in sample and not in population.
 Due to a third factor, like in the case,
Correlation between yield of rice and tea may be due to a third factor -
‘rain’.
12.2.2 Types of correlation
The following are the three categories of correlation:
1. Positive or negative
2. Simple, partial, and multiple
3. Linear and non-linear
1. Positive and negative correlations: Both the variables (X and Y) will
vary in the same direction. If variable X increases, variable Y also will
increase; and if variable X decreases, variable Y also will decrease; then the
correlation in such cases is known as positive correlation. If the given
variables vary in opposite direction, then they are said to be negatively
correlated. If one variable increases, the other variable will decrease. In
other words, the variables are negatively correlated if there is an inverse
relationship between the variables. For example, price and supply of the
commodity. On the other hand, correlation is said to be negative or inverse if
the variables deviate in the opposite direction, i.e., if the increase (decrease)
in the values of one variable results, on the average, in a corresponding
decrease (increase) in the values of the other variable. For example,
temperature and sale of woolen garments.
2. Simple, partial, and multiple correlations: In simple correlation, the
relationships between two variables are studied. In partial and multiple
correlations, three or more variables are studied. Three or more variables
are simultaneously studied in multiple correlations. In partial correlation
more than two variables are studied, but the effect on one variable is kept
constant and the relationship between the other two variables is studied. For
example, let us suppose that we have three variables, number of hours
studied (x), IQ (y), and marks obtained (z). In a multiple correlation, we will
study the correlation between z with 2 variables, x and y. In contrast, when
we study the relationship between x and z, keeping an average IQ as
constant, it is said to be a study involving partial correlation.
Manipal University Jaipur Page No. 441
Statistics for Management Unit 12

3. Linear and non-linear correlation: Correlation depends upon the


constancy of the ratio of change between the variables. In linear correlation,
the percentage change in one variable will be equal to the percentage
change in another variable. It is not so in non-linear correlation. For
example, Y = aX + b. The relationship between two variables is said to be
non-linear or curvilinear if corresponding to a unit change in one variable,
the other variable does not change at a constant rate but at a fluctuating
rate. When this is plotted in the graph, this will not be a straight line.

12.3 Methods of Correlation

METHODS OF CORRELATION

GRAPHIC ALGEBRAIC

SCATTER
COVARIANCE RANK CONCURR-
DIAGRAM
METHOD CORRELATION ENT
DEVIATION
METHOD

Fig. 12.1: Methods of correlation

12.4 Measures of Correlation


The following are three methods through which we understand the
measures of correlation:
i. Scatter diagram
ii. Karl Pearson’s correlation coefficient
iii. Spearman’s rank correlation coefficient
12.4.1 Scatter diagram
The ordered pair of observed values are plotted on XY plane as dots.
Therefore, it is also known as dot diagram. It is a diagrammatic
representation of relationship.
Manipal University Jaipur Page No. 442
Statistics for Management Unit 12

Interpreting a scatter plot


If the dots lie exactly on a straight line that runs from left bottom to right top,
then the variables are said to be perfectly positively correlated. Figure 12.2
depicts the scattered diagram for perfectly positively correlated variables.

Fig. 12.2: Perfect Positive Correlation

If the dots lie close to a straight line that runs from left bottom to right top,
then the variables are said to be positively correlated. Figure 12.3 depicts
the scattered diagram for positively correlated variables.

Fig. 12.3: Positive Correlation

If the dots lie exactly on a straight line that runs from left top to right bottom,
then the variables are said to be perfectly or exactly negatively correlated.
Figure 12.4 depicts the scattered diagram for the perfectly negatively
correlated variables.

Manipal University Jaipur Page No. 443


Statistics for Management Unit 12

Fig. 12.4: Perfect Negative Correlation

If the dots lie very close to a straight line that runs from left top to right
bottom, then the variables are said to be negatively correlated. Figure 12.5
depicts the scattered diagram for the negatively correlated variables.

Fig. 12.5: Negative Correlation

If the dots lie all over the graph paper, then the variables have zero
correlation. Figure 12.6 depicts the scattered diagram of the variables with
zero correlation.

Manipal University Jaipur Page No. 444


Statistics for Management Unit 12

Fig. 12.6: Zero Correlation

Scatter diagram tells us the direction in which they are related and does not
give any quantitative measure for comparison between data sets.
12.4.2 Karl Pearson’s correlation coefficient
A Mathematical method for measuring the intensity or the magnitude of
linear relationship between two variable series is the correlation coefficient.
In order to study the “degree of variation” between the variables in a
bivariate distribution we can use the correlation coefficient

Key Statistic
Karl Pearson’s correlation coefficient is defined as:
Cov(X, Y )
r 
S.D(X ).S.D(Y )
 xy
i) r ––––––––––––– (A)
N x  y
where, x     and y    
( X  X) 2 ( Y  Y) 2
x  and  Y 
2 2

N N
 xy
where, ‘N’ is the number of paired observations and is called

covariance of ‘x’ and ‘y’.

Manipal University Jaipur Page No. 445


Statistics for Management Unit 12

12.4.3 Properties of Karl Pearson’s correlation coefficient


The following are the properties of Karl Pearson’s correlation coefficient:
 Its value always lies between –1 and 1
 It is not affected by change of origin or change of scale
 It is a relative measure. It does not have any unit attached to it

Key Statistic
The other forms of Karl Pearson’s correlation coefficient formula are:
 xy
ii) r –––––––––––––––––––– (B)
 x  y 
2 2

N  XY   X  Y
r –––– (C)
N  X 2
 (  X) 2  N  Y 2
 (  Y) 2 
N  dx dy   dx  dy
r ––(D)
N  dx 2
 ( dx) 2
 N  dy 2
 ( dy) 2

For all practical purposes, we can conveniently use form D; whenever
summary information is given choose proper form from A to C.
12.4.4 Factors influencing the size of correlation coefficient
The size of ‘r’ is very much dependent upon the variability of measured
values in the correlation sample. The greater the variability, the higher will
be the correlation, everything else being equal. The size of ‘r’ is altered
when researchers select extreme groups of subjects in order to compare
these groups with respect to certain behaviours. Selecting extreme groups
on one variable increases the size of ‘r’ over what would be obtained with
more random sampling.
Combining two groups which differ in their mean values on one of the
variables is not likely to faithfully represent the true situation as far as the
correlation is concerned.
Inclusion of an extreme case (and similarly dropping of an extreme case)
can lead to changes in the amount of correlation.

Manipal University Jaipur Page No. 446


Statistics for Management Unit 12

Process of calculating coefficient of correlation when Deviations are


taken from Arithmetic Mean
 Calculate the means of the two series: X and Y.
 Take deviations in the two series from their respective means, indicated
as x and y. The deviation should be taken in each case as the value of
the individual item minus (–) the arithmetic mean.
 Square the deviations in both the series and obtain the sum of the
respective squares of deviation. This would give ∑x2 and ∑y2.
 Take the product of the deviations, that is, ∑xy. This means individual
deviations are to be multiplied by the corresponding deviations in the
other series and then their sum is obtained.
 The values thus obtained in the preceding steps ∑xy, ∑x2 and ∑y2 are to
be used in the formula for correlation.

 xy
r
 x  y 
2 2

Solved Problem 1
Find Karl Pearson’s correlation coefficient for the data depicted in table
12.1.
Table 12.1: Data Related to Solved Problem 1
X 20 16 12 8 4
Y 22 14 4 12 8

Solution:
Table 12.1a depicts the sums calculated for the data depicted in table 12.1a.
Table 12.1a: Sums Related to Solved Problem 1
X Y X2 Y2 XY
20 22 400 484 440
16 14 256 196 224
12 4 144 16 48
8 12 64 144 96
4 8 16 64 32
X = 60 Y = 60 X = 880
2
Y = 904
2
XY = 840

Manipal University Jaipur Page No. 447


Statistics for Management Unit 12

Applying the formula for ‘r’ and substituting the respective values from the
table we get r as:
N  XY   X  Y
r
N  X 2
 (  X) 2  N  Y 2
 (  Y) 2 
5(840)  (60)(60)
r
[5(880)  (60) 2 ][5(904)  (60) 2 ]
r  0  70
Hence, Karl Pearson’s Correlation Coefficient is 0.70.

Solved Problem 2
Calculate the correlation coefficient from the data depicted in table 12.2.
Table 12.2: Data Related to Solved Problem 2
X 50 60 58 47 49 33 65 43 46 68
Y 48 65 50 48 55 58 63 48 50 70

Solution:
Table 12.2a depicts the frequency table of the data related to solved
problem 2.
Table 12.2a: Frequency Table Data for Solved Problem 2

dx= dy=
X dx2 Y dy2 dx dy
X-50 Y-55
50 0 0 48 -7 49 0
60 + 10 100 65 + 10 100 + 100
58 +8 64 50 -5 25 - 40
47 -3 9 48 -7 49 + 21
49 -1 1 55 0 0 0
33 -17 289 58 3 9 - 51
65 + 15 225 63 8 64 + 120
43 -7 49 48 -7 49 + 49
46 -4 16 50 -5 25 + 20
68 +18 324 70 15 225 + 270
X = 519 dx =19 dx2 = 1077 Y = 535 dy = 5 dy2 = 595 dxdy =
489

Manipal University Jaipur Page No. 448


Statistics for Management Unit 12

Using the formula for calculating ‘r’ as:

N  dx dy   dx  dy
r
N  dx 2
 ( dx) 2  N  dy 2
 ( dy) 2 
And substituting values we get

10  489  19  5
r  0.611
10 1077  19 10  595  5 
2 2

Therefore, Karl Pearson’s correlation coefficient is r = 0.611.

Solved Problem 3
In a bivariate data on ‘x’ and ‘y’, variance of ‘x’ = 49, variance of ‘y’ = 9 and
covariance Cov(x, y) = -17.5. Find coefficient of correlation between ‘x’ and
‘y’.
Solution:
We know that:
 xy
r
N x  y
 xy
Given Cov(x, y) =  - 17.5
N
 x  49  7 y  9  3
 17 .5
r  - 0.833
73
Hence, there is a highly negative correlation.

Solved Problem 4
Ten observation in Weight (x) and Height (y) of a particular age group gave
the following data.

X = 56 Y = 138 X2 = 1357 Y2 = 2136 XY = 836


Find ‘r’.

Manipal University Jaipur Page No. 449


Statistics for Management Unit 12

Solution:
We know that:
N  XY   X  Y
r
N  X 2
 (  X) 2  N  Y 2
 (  Y) 2 
Given N = 10, X = 56 Y = 138
X = 1357 Y2 = 2136 XY = 836
2

10  836 (56)(138)
r  0.1286
10 1357  (56)  10  2136  (138) 
2 2

Hence, Karl Pearson’s correlation coefficient is 0.1286.

12.5 Probable Error


It measures the extent to which the correlation coefficient is dependable. It
is an old measure of testing the reliability of “r”. It is given by:

 
0  6745 1  r 2 
n
where, ‘r’ is measured from sample of size ‘n’.
Probable error is used to:
i) Interpret the value of ‘r’,
 If r < P.E, then it is not at all significant
 If r > 6 P.E, then ‘r’ is highly significant
 If P.E < r < 6 P.E, we cannot say anything about the significance of
‘r’
ii) Construct confidence limits within which correlation in the population
 is expected to lie.

If r is the observed correlation coefficient in a sample of n pairs of


observation then its standard error, usually denoted by S.E (r) is given by

SE (r) =
1  r 
2
PE (r) = SE (r) * 0.6745
n
The reason for taking the factor 0.6745 is that in a normal distribution 50%
of the distribution lie in the range μ ± 0.6745 σ

Manipal University Jaipur Page No. 450


Statistics for Management Unit 12

12.5.1 Conditions under which probable error is used


The following are some conditions under which probable error (P. E.) is
used.
1. Samples should be drawn from a normal population
2. The value of ‘r’ must be determined from sample values
3. Samples must have been selected at random

Solved Problem 5
If r = 0.6 and n = 64, then:
a) Interpret ‘r’
b) Find the limits within which ‘’ is supposed to lie
Solution:

 
0  6745 1  (0.6) 2 
64
= 0.054
a) 6    6  0  054  0  324
Since r 0  6   6   , r is highly significant.

b) Limits for population “”


 0  6  0  054
Hence, the limits within which ‘’ lies are 0.546 and 0.654.

12.6 Spearman’s Rank Correlation Coefficient


Karl Pearson’s correlation coefficient assumes that:
i. Samples are drawn from a normal population
ii. The variables under study are affected by a large number of
independent causes so as to form a normal distribution
When we do not know the shape of population distribution and when the
data is of qualitative type, Spearman’s Ranks correlation coefficient is used
to measure the relationship.

Manipal University Jaipur Page No. 451


Statistics for Management Unit 12

Key Statistic
Spearman’s Rank correlation coefficient is defined as:
6  D2
  1 3
N N
where, D is the difference between ranks assigned to the variables.
N is the number of observation
Value of ‘’ lies between ‘-1’ and ‘+1’ and its interpretation is same as that
of Karl Pearson’s correlation coefficient.
There are three types of problems. Table 12.3 depicts the types of problems
involved in calculating rank correlation coefficient.
Table 12.3: Types of Problems
Type i Ranks are assigned
Type ii Ranks are not assigned
Type iii When ranks are repeated

Type i: Ranks are assigned: When ranks are already assigned, take the
difference between the ranks of the variables and denote it by D. Then the
rank correlation is computed using the formula

6  D2
  1
N( N 2  1)

Solved Problem 6
In a singing competition, two judges assigned the ranks for seven
candidates which is depicted in table 12.4. Find Spearman’s rank correlation
coefficient.
Table 12.4: Ranks of Seven Candidates
Competitor 1 2 3 4 5 6 7
Judge I 5 6 4 3 2 7 1
Judge II 6 4 5 1 2 7 3

Solution:
Table 12.4a depicts the data of solved problem 6.

Manipal University Jaipur Page No. 452


Statistics for Management Unit 12

Table 12.4a: Data of Seven Candidates


Competitor R1 R2 D = R1 – R 2 D2
(Judge 1) (Judge 2)
1 5 6 -1 1
2 6 4 2 4
3 4 5 -1 1
4 3 1 2 4
5 2 2 0 0
6 7 7 0 0
7 1 3 -2 4
N =7 D = 14
2

6  D2
  1
N( N 2  1)

6(14) 6  14
=1– 1   0.75
7(7  1)
2
7  48

Hence, Spearman’s Rank Correlation Coefficient  is 0.75.


Type ii: Ranks are not assigned: When ranks are not given, we have to
assign the ranks to the variables either in ascending order or descending
order. Then use the same formula to compute the rank correlation.

Solved Problem 7
Find the rank difference coefficient of correlation (in case of no ties) for the
data depicted in table 12.5.
Table 12.5: Scores of Students on Test I and Test II

Student Score Score Rank of Rank Difference Difference


on Test on Test I on between squared
I Test II R1 Test II Ranks D2
X Y R2 D
A 16 8 2 5 -3 9
B 14 14 3 3 0 0
C 18 12 1 4 -3 9
D 10 16 4 2 2 4
E 2 20 5 1 4 16
N=5 D2= 38

Manipal University Jaipur Page No. 453


Statistics for Management Unit 12

Applying the formula of regulations, we get:


6  D2 6(38)
=1– 1   1  1.9  0.9
N( N  1)
2
5(5 2  1)

Relation between ‘x’ and ‘y’ is very high and inverse. Relationship between
score on Test I and II is very high and inverse.

Solved Problem 8
Table 12.6 depicts the sales statistics of six sales representatives in two
different localities. Find whether there is a relationship between the buying
habits of the people in the localities.
Table 12.6: Sales Data of Six Representatives
Representative 1 2 3 4 5 6
Locality I 70 40 65 110 60 20
Locality II 70 30 80 100 90 20

Solution:
Table 12.6a depicts the calculated values of correlation coefficient of data in
solved problem 8.
Table 12.6a: Calculating the Coefficient of Correlation
Representative Sales in Sales in D = R1-R2 D2
Locality I locality II
R1 R2
1 2 4 -2 4
2 5 5 0 0
3 3 3 0 0
4 1 1 0 0
5 4 2 2 4
6 6 6 0 0
N=6 D2= 8

6  D2
  1
N( N 2  1)
6(8) 8
=1– 1  0.7714
6(6  1)
2
35
Therefore, there is high positive correlation between the buying habits of the
locality people.

Manipal University Jaipur Page No. 454


Statistics for Management Unit 12

Type ii: When ranks are repeated


In case of attributes, if there is a tie, i.e., if any two or more individuals are
placed together in any classification with respect to an attribute or if in case
of variable data, there is more than one item with the same value in either or
both the series, then Spearman’s formula for calculating the rank correlation
coefficient breaks down, since in this case the variable X ( the ranks of
individuals in characteristic A ( 1st series) and Y ( the ranks of individuals
characteristic B ( 2nd series) do not take the values from 1 to n and
consequently X ≠ Y, while in proving we had assumed that X = Y.
For the computation of coefficient of rank correlation, while ranking the
values, two or more values may be equal. And so, a situation of ties may
arise. In such a case, all those values which are equal are assigned with the
same average rank. And then, the coefficient of rank correlation is found.
Here, corresponding to every such repeated rank correlation is found. Here
corresponding to every such repeated rank (which repeats m times), a factor
(m3 – m) / 12 is added to ∑d2
In this case, common ranks are assigned to the repeated items. These
common ranks are the arithmetic mean of the ranks which these items
would have got if they were different from each other and the next item will
get the rank next to the rank used in computing the common rank. For
example, suppose an item is repeated at rank 4. Then the common rank to
be assigned to each item is (4 + 5) / 2, i.e, 4.5 which is the average of 4 and
5, the ranks which these observations would have assumed if they were
different. The next item will be assigned the rank 6. if an item is repeated
thrice at rank 7, then the common rank to be assigned to each value will be
(7+8+9)/ 3, i.e., 8 which the arithmetic mean of 7, 8, and 9 viz. the ranks
these observation would have got if they were different from each other. The
next rank to be assigned will be 10.
If only a small proportion of the ranks are tied, this technique may be applied
together with formula. If a large proportion of ranks are tied, it is advisable to
apply an adjustment or a correction factor as explained:
“In a formula add the factor m (m3 – 1) / 12 to ∑d2, where m is the number of
times an item is repeated. This correction factor is to be added for each
repeated value in both the series.

Manipal University Jaipur Page No. 455


Statistics for Management Unit 12

Solved Problem 9
Find the rank correlation coefficient for the data depicted in table 12.7.
Table 12.7: Scores of Student in Test I and Test II
Student A B C D E F G H I J
Score on Test I 20 30 22 28 32 40 20 16 14 18
Score on Test II 32 32 48 36 44 48 28 20 24 28

Solution:
Table 12.7a depicts the required data for calculating the correlation
coefficient.
Table 12.7a: Ranks of Test I and Test II
Score Score Rank Rank Difference
Difference
on on of on between
Student squared
Test I Test II Test I Test II Ranks
D2
X Y R1 R2 D
A 20 32 6.5 5.5 1.0 1.00
B 30 32 3 5.5 - 2.5 6.25
C 22 48 5 1.5 3.5 12.25
D 28 36 4 4 0 0
E 32 44 2 3 - 1.0 1.00
F 40 48 1 1.5 - 0.5 0.25
G 20 28 6.5 7.5 - 1.0 1.00
H 16 20 9 10 - 1.0 1.00
I 14 24 10 9 1.0 1.00
J 18 28 8 7.5 0.5 0.25
N = 10 D2 = 24


 = 1 – 6  D  1 / 12(m1  m1 )  1 / 12(m2 m2 )  1 / 12(m3 m3 )  1 / 12(m4 m4 )
2 3 3 3 3

N( N 2  1)

Where, mi represents the number of times a rank is repeated.

=1–
6  24  1 / 12(2 3
 2)  1 / 12(2 3  2)  1 / 12(2 3  2)  1 / 12(2 3  2) 
10(10 2  1)

=1–
144  0.5  0.5  0.5  0.5 = 1 – 146
 0.8525
10  99 10  99

Manipal University Jaipur Page No. 456


Statistics for Management Unit 12

Activity:
Find the rank correlation from the following distribution
Cost 39 65 62 90 82 75 25 98 36 78
Sales 47 53 58 86 62 68 60 91 51 54

Activity Solution
Cost Sales
X Y R1 R2 D D2
39 47 8 10 -2 4
65 53 6 8 -2 4
62 58 7 7 0 0
90 86 2 2 0 0
82 62 3 5 -2 4
75 68 5 4 1 1
25 60 10 6 4 16
98 91 1 1 0 0
36 51 9 9 0 0
78 54 4 3 1 1
D2 = 30
6  D2
  1
N( N 2  1)
6  30 180
  1  1  0.82
10(10  1)
2
990

12.7 Partial Correlation


Partial correlation is used in a situation where three or four variables are
involved. The three variables may be age, height, and weight. Correlation
between height and weight can be computed by keeping the age constant.
Age may be the important factor influencing the strength of relationship
between height and weight. Partial correlation is used to keep constant the
effect of age. The effect of one variable is partially found from the correlation
between the other two variables. This statistical technique is known as
Partial Correlation. Correlation between variables ‘x’ and ‘y’ is denoted as
‘rxy’. Further, partial correlation between ‘x’ and ‘y’ keeping the variable ‘z’
constant is denoted by ‘rxy.z’

Manipal University Jaipur Page No. 457


Statistics for Management Unit 12

Key Statistic
Partial correlation is denoted by the symbol r12.3. Here correlation
between variable 1 and 2 keeping 3rd variable constant is:
r12  r13 .r23
r12.3 
1  r13 . 1  r23
2 2

where,
r12.3 = Partial correlation between variables 1 and 2 keeping 3rd constant
r12 = correlation between variables 1 and 2
r13 = correlation between variables 1 and 3
r23 = correlation between variables 2 and 3
Similarly,
r13  r12 . r23 r23  r12 . r13
r13.2  and r23.1 
1  r12  1  r23 1  r12  1  r13
2 2 2 2

Solved problem 10
Given r12 = 0.8, r13 = 0.5 and r23 = 0.4, calculate all partial correlations.

Solution:
(i) The correlation between variables 1 and 2 keeping the 3rd constant is
given by:
r12  r13 .r23 0.8  0.5  0.4 0.6
r12.3     0.756
2
1  r13 . 1  r23
2
1  0.5  1  0.4
2 2 0.794

(ii) The correlation between variables 1 and 3 keeping the 2nd constant is
given by:

r13  r12 .r23 0.5  0.8  0.4 0.18


r13.2     0.33
1  r12 . 1  r23
2 2
1  0.8  1  0.4
2 2 0.55

(iii) The correlation between variables 2 and 3 keeping the 1st constant is
given by:
r23  r21.r13 0.4  0.8  0.5
r23.1   0
1  r21 . 1  r13
2 2
1  0.8 2  1  0.5 2

Manipal University Jaipur Page No. 458


Statistics for Management Unit 12

Self Assessment Questions


Calculate the required correlation coefficients.
1. i. From the following data, calculate the correlation between variables 1
and 2 keeping the 3rd constant.
r12 = 0.7; r13 = 0.6 r23 = 0.4
ii. Calculate r23.1 and r13.2 from the following:
r12 = 0.60; r13 = 0.51; r23 = 0.40
iii. Given the zero order correlation coefficients, calculate the partial
correlation between variables 1 and 3 keeping the 2nd variable
constant. Interpret your result.
r12 = 0.8; r13 = 0.6; r23 = 0.5

12.8 Multiple Correlations


Three or more variables are involved in Multiple Correlations. The
dependent variable is denoted by X1 and other variables are denoted by X2,
X3 etc. Gupta S.P. has expressed that “the coefficient of multiple linear
correlation is represented by R1 and it is common to add subscripts
designating the variables involved”. Thus R1.234 would represent the
coefficient of multiple linear correlations between X1 on the one hand, X2,
X3, and X4 on the other. The subscript of the dependent variable is always to
the left of the point.
The coefficient of multiple correlations for R1.23, R2.13, and R3.12 can be
expressed as:

R1.23 = r
12
2
 r13 2  2 r12 r13 r23  1  r 
23
2

R2.13 = r
2
12
 r 2  2 r12 r13 r23
23
 1  r 
2
13

R3.12 = r
2
13
 r23
2
 2 r12 r13 r23  1 r 
2
12

Coefficient of multiple correlations for R1.23 is the same as R1.32.


A coefficient of multiple correlation lies between ‘0’ and ‘1’. If the coefficient
of multiple correlations is ‘1’, it shows that the correlation is perfect. If it is ‘0’,
it shows that there is no linear relationship between the variables. The

Manipal University Jaipur Page No. 459


Statistics for Management Unit 12

coefficients of multiple correlations are always positive in sign and range


from ‘0’ to ‘+1’. Coefficient of multiple determinations can be obtained by
squaring R1.23.
Alternative formula for computing R1.23 is:

R1.23  r12 2  r13.2 2 (1  r12 2 ) or

R 21.23  r12 2  r13.2 2 (1  r12 2 )

Similarly, alternative formulas for R1.24 and R1.34 can be obtained.


Multiple correlation analysis measures the relationship between the given
variables. In this analysis, the degree of association is measured between
one variable (which is considered as the dependent variable) and a group of
other variables (which are considered as independent variables).

Solved Problem 11
The following are the zero order correlation coefficients.
r12 = 0.98; r13 = 0.44 r23 = 0.54

Calculate the multiple correlation coefficient treating the first variable as


dependent and second and third variables as independent.

Solution:
The first variable is dependent. The second and third variables are
independent. Using the formula for multiple correlation coefficients for R1.23
we get:

R1.23 = r 2
12  r13
2
 2r 12 r 13 r 23  1  r 
2
23
= 0.986

Hence the multiple correlation coefficient is 0.986.

Self Assessment Questions


2. State whether the following statements are ‘True’ or ‘False’.
i. Scatter diagram does not give us a quantitative measure of
correlation coefficient.
ii. Correlation estimates the value of one variable from the knowledge
of the other.
iii. Correlation coefficient is an absolute measure.

Manipal University Jaipur Page No. 460


Statistics for Management Unit 12

12.9 Regression
According to M. M. Blair, Regression is defined as, “the measure of the
average relationship between two or more variables in terms of the original
units of the data”.
Correlation analysis attempts to study the relationship between the two
variables ‘X and ‘Y’. In regression, it is attempted to quantify the
dependence of one variable on the other. For example, if there are two
variables ‘X’ and ‘Y’ and ‘Y’ depends on ‘X’, then the dependence is
expressed in the form of the equations.
12.9.1 Regression analysis
Regression analysis is used to estimate the values of the dependent
variables from the values of the independent variables. Regression analysis
is used to get a measure of the error involved while using the regression line
as a basis for estimation. The regression coefficient Y on X is the coefficient
of the variable ‘X’ in the line of regression Y on X. Regression coefficients
are used to calculate the correlation coefficient. The square of correlation is
the product of regression coefficients.
12.9.2 Regression lines
For a set of paired observations, there exist two straight lines. The line
drawn in such a way that the sum of vertical deviation is zero and the sum of
their squares is minimum, is called regression line of ‘Y’ on ‘X’. It is used to
estimate ‘Y’ values for given ‘X’ values. The line drawn in such a way that
the sum of horizontal deviation is zero and sum of their squares is minimum,
is called regression line of ‘X’ on ‘Y’. It is used to estimate the ‘X’ values for
the given ‘Y’ values. The smaller the angle between these lines, the higher
is the correlation between the variables. The regression lines always
intersect at ( X, Y ).

The regression lines have equation,


i) The regression equation of ‘Y’ on ‘X’ is given by:


Y  Y  b yx X  X 
ii) The regression equation of ‘X’ on ‘Y’ is given by:

X  X  b xy Y  Y 
Manipal University Jaipur Page No. 461
Statistics for Management Unit 12

where,
N  dxdy  ( dx) ( dy) 
b xy  or b xy  r x
N  dy  ( dy)
2 2
y
N  dxdy  ( dx) ( dy) y
b yx  or b  r
N  dx 2  ( dx) 2 x
yx

‘byx’ and ‘bxy’ are called regression coefficients.


12.9.3 Regression coefficient
When a regression is linear, then the regression coefficient is given by the
slope of the regression line.
 The geometric mean of regression coefficients gives the correlation
coefficient.
r 2  b yx .b xy
r  b yx .b xy
 The product of regression coefficients is always less than 1,that is,
b yx .b xy  1
 If ‘byx’ is negative, then ‘bxy’ is also negative and ‘r’ is negative.
 They can also be expressed as:
y x
b yx  r. and b xy  r.
x y
 It is an absolute measure
The differences between Correlation and Regression Coefficient are
depicted in table 12.8.
Table 12.8: Differences Between Correlation and Regression Coefficient
Correlation Coefficient Regression Coefficient
The correlation coefficients, rxy = ryx The regression coefficients, b yx  b xy
‘r’ lies between -1 and 1. ‘byx’ can be greater than one in which
case ‘bxy’ must be less than one such
that byx.bxy  1
It has no units attached to it. It has units attached to it.

Manipal University Jaipur Page No. 462


Statistics for Management Unit 12

There exists nonsense correlation. There is no such nonsense regression.


It is not based on cause and effect It is based on cause and effect
relationship. relationship.
It indirectly helps in estimation. It is meant for estimation.

Solved Problem 12
Find regression equation from the data depicted in table 12.9. Then
calculate the correlation coefficient.
Table 12.9: Data of Ages of Husband and Wife
Age of Husband 18 19 20 21 22 23 24 25 26 27
Age of Wife 17 17 18 18 19 19 19 20 21 22

Solution:
Table 12.9a depicts the data required for calculation of correlation and
regression coefficients.
Table 12.9a: Data Required for Calculation of Correlation and Regression
Coefficients
Age of
Age of wife
husband dx = X-22 dx2 dy = Y-19 dy2 dx dy
Y
X
18 -4 16 17 -2 4 8
19 -3 9 17 -2 4 6
20 -2 4 18 -1 1 2
21 -1 1 18 -1 1 1
22 0 0 19 0 0 0
23 1 1 19 0 0 0
24 2 4 19 0 0 0
25 3 9 20 1 1 3
26 4 16 21 2 4 8
27 5 25 22 3 9 15
∑X =225 ∑dx = 5 ∑dx2=85 ∑Y = 190 ∑dy = 0 ∑dy2=24 ∑dxdy= 43

225 190
X  22.5 Y  19
10 10

Manipal University Jaipur Page No. 463


Statistics for Management Unit 12

Regression equation of Y on X is :
Y  Y  b y x (X  X)
N  dxdy  ( dx) ( dy)
b yx 
N  dx 2  ( dx) 2
10  43  (5) (0) 430
byx =   0.521
10  85  (5) 2 825
   19  0.521  22.5
   0.521  7.2775
Regression Equation of X and Y is:

X  X  b xy Y  Y 
N  dxdy  ( dx) ( dy)
b xy 
N  dy 2  ( dy) 2
10  43  (5) (0) 430
bxy =   1.792
10  24  (0) 2 240
   22.5  1.792   19 
   1.792  11.548
r  b yx .b xy
r  0.521x1.792  0.966
Hence, the Correlation Coefficient ‘r’ is 0.966.
Solved Problem 13
Table 12.10 depicts the results that were worked out from scores in
statistics and mathematics in a certain examination.
Table 12.10: Scores in Statistics and Mathematics
Scores in Statistics Scores in Mathematics
X Y

Mean 40 48
Standard Deviation 10 15

Manipal University Jaipur Page No. 464


Statistics for Management Unit 12

Karl Pearson’s correlation coefficient between ‘X’ and ‘Y’ is = + 0.42. Find
the regression lines ‘X’ on ‘Y’ and ‘Y’ on ‘X’. Use the regression lines to find
the value of ‘Y’ when X = 50 and value of ‘X’ when Y = 30.
Solution:
Given the following data:
X  40; Y  48 x = 10; y = 15; r = 0.42
The regression line X on Y is:


X  X  b xy Y  Y 
x x 10
b xy  r , b xy  r  0.42   0.28
y y 15
   40  0.28  48 
   0.28  26.56
The regression line ‘y’ on ‘x’ is given as:

Y  Y  b yx X  X 
y y 15
b yx  r , b yx  r  0.42   0.63
x x 10
   48  0.63  40 
 Y  0.63X  22.8
Therefore,
when Y = 30;   0.28  26.56 ; X = 34.96
when X =50; Y  0.63X  22.8 ; Y = 54.3

12.10 Standard Error of Estimate


The standard error of estimates helps to measure the accuracy of the
estimated figures in regression analysis. If the value of the standard error of
estimate is small, it shows that the estimate provided by the regression
equation is better and closer. If standard error of estimate is zero, it shows
that there is no variation about the line and the correlation will be perfect.
Standard Error of Estimate is the average of the square of the deviations
between the actual values and the estimated values based on the
regression equations.

Manipal University Jaipur Page No. 465


Statistics for Management Unit 12

The standard error of estimate of X values from Y is:

(X  X c ) 2
Sxy =
N
The standard error of estimate of Y values from X is:

(Y  Yc ) 2
S xy  ,
N
where Yc and Xc are the estimated values of Y and X variables from the line
of regression of Y on X and X on Y respectively.
The following simpler formulae are used for calculating Sxy and Syx
 X 2  a  X  b  XY
S xy 
N
 Y 2  a  Y  b  XY
S yx 
N
To make the standard error an unbiased estimate of the actual variance of
the X or Y values, we divide the variability by (N - 2)

(X  X c ) 2
Sxy =
N2

(Y  Yc ) 2
S xy 
N2

12.11 Multiple Regression Analysis


Multiple regression analysis is an extension of two variable regression
analysis. In this analysis, two or more independent variables are used to
estimate the values of a dependent variable, instead of one independent
variable.
Objectives of multiple regression analysis are:
 To derive an equation which provides estimates of the dependent
variable from values of the two or more independent variables
 To obtain the measure of the error involved in using the regression
equation as a basis of estimation
Manipal University Jaipur Page No. 466
Statistics for Management Unit 12

 To obtain a measure of the proportion of variance in the dependent


variable accounted for or explained by the independent variables
Multiple regression equation explains the average relationship between the
given variables and the relationship is used to estimate the dependent
variable. Regression equation refers to the equation for estimating a
dependent variable. Estimating dependent variable Y from the independent
variables X1, X2……, is known as regression equation of Y on X1, X2……….
Let the dependent variable be Y which depends on two independent
variables X1 and X2
The linear relationship among Y, X1 and X2 can be expressed in the form of
the regression equation of Y on X1 and X2 in the form:
Y= b0 + b1 X1 + b2 X2
Where b0 is referred to as intercept and b1 & b2 are known as regression
coefficients.
The values of b0, b1 & b2 can be determined by solving the normal equations:
 i  b 0  b 1   1i  b 2   2i

 i  b 0   1i  b 1   1i  b 2   1i X 2i
2
1i

 i  b 0   2i  b 1i X 2i  b 2   2i
2
2i

The values of b0, b1 & b2 are estimated with the help of Principle of Least
squares.
12.11.1 Application of Multiple Regression
Multiple regressions analysis can be applied to test the factors such as
export elasticity, import elasticity, and structural change (contribution of
manufacturing sector towards GDP) influencing over employment. Here,
employment is a dependent variable.
Similarly, researchers can attempt to use multiple regressions in their
research work appropriately.

Self Assessment Questions


3. State whether the following statements are ‘True’ or ‘False’.
i. Correlation coefficient is a geometric mean between regression
coefficients.
Manipal University Jaipur Page No. 467
Statistics for Management Unit 12

ii. The regression lines always intersect at (X, Y) .


x
iii. b xy  r.
y
iv. The higher the angle between regression coefficients, the lower is
the correlation coefficient.

12.12 Summary
Let us recapitulate the important concepts discussed in this unit:
 When two or more variables move in sympathy with the other, then they
are said to be correlated. If both variables move in the same direction,
then they are said to be positively correlated. If the variables move in the
opposite direction, then they are said to be negatively correlated. If they
move haphazardly, then there is no correlation between them.
 Regression helps us to study unknown variables with the help of known
variables. It also establishes a reliability measure for estimated values.
 Regression analysis helps to quantify the dependence of one variable
on the other. Some of the regression types are simple and multiple
regressions, linear and non linear regression.
 Regression analysis is useful in business and economic scenarios in the
decision making process.

12.13 Glossary
Correlation: When two or more variables move in sympathy with the other,
then they are said to be correlated.
Correlation coefficient: Critical statistic which indicates the direction and
intensity of a relationship between two continuous variables. Domain
extends from -1 through 0 to +1. Significance can be determined via
statistical testing. Both parametric and nonparametric correlation coefficients
are possible.
Coefficient of variation: A relative measure of variation, expressed as a
percentage; useful in comparing the variability of data sets with different
units of measure.

Manipal University Jaipur Page No. 468


Statistics for Management Unit 12

Range: A nonresistant measure of variation in data that is equal to either


the minimum and maximum value or the difference between these two
values. The range in a large representative sample should approach the
domain of the variable of interest.
Regression: Technique to produce mathematical models of casual
relationships between and among variables. Regression models are used to
describe and potentially predict outcomes.

12.14 Terminal Questions


1. Table 12.11 depicts the marks obtained by 10 students in commerce
and statistics. Calculate the rank correlation.
Table 12.11: Marks of Students Obtained in Commerce and Statistics
Marks in Statistics 35 90 70 40 95 45 60 85 80 50
Marks in Commerce 45 70 65 30 90 40 50 75 85 60

2. Calculate Spearman’s rank correlation coefficient between the series A


and B depicted in table 12.12.
Table 12.12: Series Data of Terminal Question 2
Series A 57 59 62 63 64 65 55 58 57
Series B 113 117 126 126 130 129 111 116 112

3. For the data in table 12.13, obtain the two lines of regression and its
estimation of the blood pressure when age is 50 yrs.
Table 12.13: Data for Terminal Question 3
Age in yrs (X) 56 42 72 39 63 47 52 49 40 42 68 60
B P (Y) 127 112 140 118 129 116 130 125 115 120 135 133

4. Table 12.14 depicts the results that were worked out from scores in
statistics and mathematics in a certain examination.
Table 12.14: Results of Scores in Statistics and Mathematics Examination
Scores in Statistics Scores in Mathematics
(X) (Y)
Mean 39.5 47.5
Standard Deviation 10.8 17.8

Manipal University Jaipur Page No. 469


Statistics for Management Unit 12

Karl Pearson’s correlation coefficient between X and Y = 0.42. Find both the
regression lines. Use these lines to estimate the value of Y when X = 50 and
the value of X when Y = 30.

12.15 Answers

Self Assessment Questions


1. i) Refer section 12.7
ii) Refer section 12.7
iii) Refer section 12.7
2. i) True ii) False iii) False
3. i) True ii) True iii) True iv) True

Terminal Questions
1. 0.903
2. 0.967
3. X = - 95 + 1.184
Y = 87.2 + 0.724
4. X = 27.62 + 0.25Y
Y = 20.24 + 0.69X

12.16 Case Study


India is ranked as 126th in the Human Development Index (HDI) among 177
countries for which data is compiled as per the report released during
November 2006, and published in Hindustan Times dated November 10,
2006. HDI depends on indicators such as expectancy, literacy, and per
capita income. Use appropriate correlation and regression analysis to
prepare a report on the basis of the given data.

Manipal University Jaipur Page No. 470


Statistics for Management Unit 12

Table 12.12: Human Development Index Table


HDI Life Adult School GDP Human Population
Rank Expect- Literacy Enrol- Per Poverty (2004)
ancy Rate(% ment Capita Index
ages 15 & % Rank
older)
Norway 1 79.6 NA 100 38,454 Nil 4.6 77.3
Iceland 2 80.9 NA 96 33,051 Nil 0.3 92.7
USA 8 77.5 NA 93 39,676 Nil 295.4 80.5
Thailand 74 70.3 92.6 74 8,090 19 63.7 32.0
China 81 71.9 90.9 70 5,896 26 1,308 22.0
Srilanka 93 74.3 90.7 63 4,390 38 20.6 15.2
India 126 63.3 61.0 62 3,139 55 1,087.1 28.5

By calculating the rank correlation, find out as to which of the indicators viz.
life expectancy, literacy, and GDP affects the HDI to the maximum extent.
To what extent the life expectancy in the nation depends on the percentage
of its urban population?
(Source: Srivastava, T. N. and Rejo, S. (2008) Statistics for Management, 5 th
edition, TMH)

References
 Agarwal, B. L. (2006) Basic Statistics, 4th Edition, New Age International
Publishers.
 Bowerman, B. L. and Connel, R.T. O., (1996) Applied Statistics:
Improving Business Processes, Irwin.
 Levin, R. I., Rubin, D. S. (2008), Statistics for Management, 7th Edition,
PHI Learning Private Limited.
 Pisani, F. D. R., and Purves, R. (1997), Statistics, 3rd edition, W.W
Norton.
 Srivastava, T. N. and Rejo, S. (2008) Statistics for Management, 5th
edition, TMH.
 Tanur,J. M., (2002), Statistics: A Guide to the unknown, 4th
edition,Brooks/cole.

Manipal University Jaipur Page No. 471


Statistics for Management Unit 12

 Tukey, J.W, (1977), Exploratory Data Analysis, Addison–Wesley.


 Wilcox, R. R. (2009) Basic Statistics – Understanding Conventional
Methods and Modern Insights, Oxford University Press.

E-Reference
 http://www.textbooksonline.tn.nic.in/Books/11/Stat-EM/Chapter-1.pdf

Manipal University Jaipur Page No. 472

You might also like