Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

Qt Module II Correlation and Regression Analysis

The document provides an overview of correlation and regression analysis, detailing the definitions, types, measures, and properties of correlation, as well as regression analysis methods. It explains various correlation types such as positive, negative, perfect, and imperfect, and discusses how to measure correlation using graphical and algebraic methods. Additionally, it highlights the differences between correlation and regression, their uses, limitations, and the significance of the coefficient of determination.

Uploaded by

sgralisjaky2255
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Qt Module II Correlation and Regression Analysis

The document provides an overview of correlation and regression analysis, detailing the definitions, types, measures, and properties of correlation, as well as regression analysis methods. It explains various correlation types such as positive, negative, perfect, and imperfect, and discusses how to measure correlation using graphical and algebraic methods. Additionally, it highlights the differences between correlation and regression, their uses, limitations, and the significance of the coefficient of determination.

Uploaded by

sgralisjaky2255
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

MODULE II

CORRELATION AND REGRESSION ANALYSIS


CORRELATION
Meaning and Definition
Correlation coefficients are used in statistics to measure how strong a
relationship is between two variables. Two variables are said to be correlated if a
change in one variable results in a corresponding change in the other variable. It
measures the strength of association between two variables and also the direction
of their relationship.
According to A.M Tuttle “Correlation is an analysis of the association
between two or more variables”
Types of correlation
Correlation is classified as:
 Positive and Negative correlation
When the values of two variables move in the same direction, correlation
is said to be positive ie an increase in the value of one variable results into
an increase in the value of the other variable also or if a decrease in the value
of one variable, results into a decrease in the value of the other variable also
correlation is said to be positive.
The value of two variables move in opposite directions so that an
increase in the value of one variable, results in to a decrease in the value of
the other variable or the decrease in the value of the one variable results into
an increase in the value of the other variable the correlation is said to be
negative.

 Perfect and Imperfect Correlation


If the values of the variables are changed at a constant rate, it is called
perfect correlation. In other words if the change in value of one variable is
exactly proportional to the change in the value of the other is called Perfect
correlation. The values of perfect correlation will be either +1 or -1.
If the values of the variables are changed at different ratios, it is called
imperfect correlation. The value of an imperfect correlation lies in between -
1 and +1.
 Linear and Non Linear Correlation
When the amount of change in one variable leads to a constant ratio of
change in the other variable, correlation is said to be linear. When there is a
linear correlation, the point plotted on a graph will give straight line.
When the amount of change in one variable is not in constant ratio to the
change in the other variable, correlation is said to be non linear.

 Simple, Partial and Multiple correlation


In the study of relationship between variables, if there are only two
variables, the correlation is said to be simple. For example: the correlation
between price and demand is simple.
If only two out of three or more variables are studied, it is called Partial
correlation. For example: calculation of correlation between the quantity
demanded and the price of the product by ignoring the income of the
consumers, amount spent for advertisement etc.
In Multiple correlation, more than two variables are studied
simultaneously. Example: the study of relationship among different
variables such as quantity demanded, price of the product and income of
consumers simultaneously.

Measures of Correlation
Correlation between two variables can be measured by both graphic and
algebraic method.
Graphic methods are:
 Scatter diagram: The scatter diagram is a visual aid to show the presence
or absence of correlation between two variables. It is also known as dot
diagram. Under this method, one variable is shown on X-axis and the other
variable is shown on Y-axis. For each pair of X and Y, one dot each is
plotted on the graph. After plotting all such dots, degree of correlation
between the variables (X and Y) is estimated by examining the shape of
plotted dots.
Advantages:
 It is easy to plot the points.
 It is simple to understand.
 Abnormal values in the data can be easily estimated.
 The extreme values do not affect it.
 The value of dependent variable for a given value of independent
variable can be detected.
Disadvantages:
 Algebraic treatment is not possible.
 The degree of correlation cannot be easily estimated.
 When the number of pairs of observations is either very big or very
small, the method is not easy.

 Correlation Graph
Under this method, separate curves are drawn for the X variable and y
variable on the same graph paper. The values of the variable are taken
as ordinates of the points plotted. From the direction and closeness of
the two curves we can infer whether the variables are related. If both
the curves move in the same direction (upward or downward),
correlation is said to be positive. If the curves are moving in the
opposite distinction correlation is said to be negative.
Algebraic methods:
Co-efficient of correlation
Under this method, we measure correlation by finding a value known as the
coefficient of correlation using an appropriate formula.
It shows the degree or extent of correlation between two variables.
Co-efficient of correlation is a pure number lying between -1 and +1. When the
correlation is negative, it lies between -1 and 0. When the correlation is positive, it
lies between 0 and 1. When the co-efficient of correlation is zero, it indicates that
there is no correlation between the variables. When the correlation coefficient is 1,
there is perfect correlation.
Co-efficient of correlation can be computed by applying the following methods:
 Karl Pearson’s Co-efficient of correlation: This method is considered as
the best measure because it provides the knowledge of directions of changes
in data ie positive or negative and also shows the degree of correlation.
According to Karl pearson, the coefficient of correlation lies between two
limits (ie between -1 and +1).
Merits:
 It shows the direction as positive or negative correlation.
 It has a lot of algebraic properties and hence it can be used for further
algebraic treatment.
 This method not only indicates the presence, or absence of correlation
but also determines the degree or extent to which the given two
variables are correlated.
 It enables us in estimating the value of a dependent variable with
reference to a particular value of an independent variable through
regression equations.
Demerits:
 It is comparatively difficult to calculate.
 It is very much likely to be misinterpreted.
 It is based on certain assumptions, which may not always hold good.
 Compared to the other methods, it takes much time to arrive at the
results.
 The result is very much affected by extreme values in the data sets.

Degree of Correlation
 Perfect correlation: When the change in the two variables in such that
with an increase in the value of one, the value of the other increases in
a fixed proportion, correlation is said to be perfect. Co-efficient of
correlation is +1 for perfect positive correlation and it is -1 for perfect
negative correlation.
 No Correlation: If changes in the value of one variable are not
associated with changes in the value of the other variable, there will be
no correlation. When there is no correlation the co efficient of
correlation is zero.
 Limited degree of correlation: In between perfect correlation and no
correlation there may be limited degree of correlation. It may be
positive or negative. Limited degree of correlation may be termed as
high, moderate or low.

PROPERTIES OF CORRELATION COEFFICIENT


 Correlation coefficient has a well defined formula.
 Correlation coefficient is a pure number and is independent of the unit
of measurement.
 It lies between -1 and +1.
 Correlation coefficient does not change with reference to change of
origin or change of scale.
 Coefficient of correlation between X and Y is same as that between Y
and X.
PROBABLE ERROR (P.E)
It is a statistical measure which measures reliability and dependability of the value
of coefficient of correlation. If probable error is added to or subtracted from the
coefficient of correlation it would give two such limits within which we can
reasonably expect the value of coefficient of correlation to vary.
STANDARD ERROR (S.E)
It is basically the standard deviation of any mean. It is generally used to refer to
any sort of estimate belonging to the standard deviation.
Advantages are: (a) It helps in finding and reducing the sample errors as well as the
measurement errors. (b) The standard of any mean tells about the accuracy of the
estimate clearly enough.
Interpretation of Coefficient of correlation on the basis of probable
error:
 If the coefficient of correlation is less than its probable error, it is not at
all significant.
 If the coefficient of correlation is more than six times its probable
error, it is significant.
 If the probable error is not much and if the coefficient of correlation
is .5 or more it is generally considered to be significant.

 Spearman’s Rank Correlation: When the values of two variables are


expressed in ranks and therefrom correlation is obtained, that correlation is
is known as rank correlation. It is the correlation obtained from ranks,
instead of their quantitative measurement.
The three method are applied in this case:
 When the ranks are given
 When the values are given.
 When the value or rank is repeated.

Merits:
 It is easy to calculate.
 It is simple to understand.
 It can be applied to both quantitative and qualitative data.

Demerits:
 Rank correlation coefficient is only approximate measure as the actual
values are not used.
 It is not convenient when ‘n’ is large.
 Further algebraic treatment is not possible.

 Concurrent Deviation method: Under this method, the directions of


deviations are only taken. The magnitudes of values are ignored. It is used to
indicate whether the correlation is towards positive or negative direction
especially in the data series characterized by short term fluctuations.

Causation
Causation means one variable directly influences another. For eg: one
variable increases because the other decreases. Causation in relation
correlation means that one event directly causes another event to occur.
When a change in one variable directly leads to a change in another variable,
establishing a cause and effect rfelationship.
Uses of Correlation
 It helps to study the association between two variables.
 It measures degree of relation between two variables.
 From the correlation coefficient, we can develop a measure called
probable error.
 Correlation analysis helps to estimate the future values.
 Correlation analysis is useful in understanding economic behavior.
 It helps in finding out interrelated variables.

Limitations of Correlation
 The correlation study indicates the existence of correlation, but it does
not indicate the cause and effect relationship.
 Correlation coefficient assumes linear relationship regardless of the
assumption is correct or not.
 Extreme items of variables are being unduly operated on correlation
coefficient.

Coefficient of determination

Coefficient of determination gives the percentage variation in the


dependent variable in relation with the independent variable. It is the square of the
correlation coefficient. It is much useful and better measure of interpreting the
value of r. It states what percentage of variations in the dependent variable is
explained by the independent variable.

REGRESSION ANALYSIS
Regression analysis means the estimation or the prediction of the
unknown value of one variable from the known value of the other variable. It is a
statistical device used to study the relationship between two or more variables that
are related
In the words of M. M. Blair” Regression analysis is a mathematical measure
of the average relationship between two or more variables in terms of the
original units of the data”.
In regression analysis there are two types of variables. The variable whose value
Is influenced or is to be predicted is called dependent variable and the variable
which influences the values or is used for prediction, is called independent
variable.

Types of Regression
Regression analysis can be classified into:
 Simple and Multiple Regressions: When there are only two variables
the regression equation obtained is called simple regression equation.
In multiple regression analysis there are more than two variables and
we try to find out the effect of two or more independent variables on
one dependent variable.
 Linear and Non linear Regression: If all the values of both the given
two variables are plotted on a graph, we will get the curve of
regression, which can be formed to a straight line. The regression is
termed as linear if such curve of regression is formed to a straight line.
If the regression is not formed as a straight line, it is known as non-
linear correlation.
 Total or Partial Regression: In total regression analysis, all the
influencing independent variables are considered while for estimating
the value of dependent variable. But in partial regression analysis,
influence of variables which are not relevant for a given purpose, are
excluded.

Line of Best fit or Regression


When the given bivariate data are plotted on a graph, we get the scatter
diagram. If the points of the scatter diagram concentrate around a straight line, that
line is called the line of best fit. That is the line of best fit is that line which is
closer to the points of the scatter diagram. This line is known as Regression line. A
regression line is a graphic technique to show the functional relationship between
the dependent and the independent variables. It shows average relationship
between the variables.
Determination of Simple linear Regression: It refers to the process of finding
the best fit line that describes the linear relationship between two variables. The
simple linear regression line, y=a+bx.
Regression equations: It is a mathematical relation between the dependent and the
independent variables. There are two regression lines and therefore there are two
regression equations. That is Regression equation of y on x and Regression
equation of x on y.

Uses of Regression Analysis


 It helps in establishing a functional relationship between two or more
variables.
 Regression analysis predicts the values of dependent variables from the
values of independent variables
 With the help of regression analysis, extreme cases can be identified.
 Coefficient of correlation and Coefficient of determination can be
calculated with the help of regression coefficients.

Limitations of Regression analysis


 It involves lengthy procedures and complicated calculations.
 It cannot be applied in the case of qualitative phenomenon.
 Linear regression assumes a linear relationship between dependent and
independent variables. But in practice, such linear relationship exists in
rare cases.
 Extreme values can have a negative impact on linear regression’s
performance.

Difference between Correlation and Regression


 In correlation analysis we study degree of relationship between the
variables whereas in regression analysis we study the nature of
relationship.
 In Correlation analysis, the choice of dependent variables is purely a
personal choice and is of no practical significance. In regression
analysis, one has to decide which variable shall be taken as dependent
and which as independent.
 Correlation analysis is not for the purpose of prediction whereas the
regression analysis is basically used for prediction purposes.

***************

You might also like