Qt Module II Correlation and Regression Analysis
Qt Module II Correlation and Regression Analysis
Measures of Correlation
Correlation between two variables can be measured by both graphic and
algebraic method.
Graphic methods are:
Scatter diagram: The scatter diagram is a visual aid to show the presence
or absence of correlation between two variables. It is also known as dot
diagram. Under this method, one variable is shown on X-axis and the other
variable is shown on Y-axis. For each pair of X and Y, one dot each is
plotted on the graph. After plotting all such dots, degree of correlation
between the variables (X and Y) is estimated by examining the shape of
plotted dots.
Advantages:
It is easy to plot the points.
It is simple to understand.
Abnormal values in the data can be easily estimated.
The extreme values do not affect it.
The value of dependent variable for a given value of independent
variable can be detected.
Disadvantages:
Algebraic treatment is not possible.
The degree of correlation cannot be easily estimated.
When the number of pairs of observations is either very big or very
small, the method is not easy.
Correlation Graph
Under this method, separate curves are drawn for the X variable and y
variable on the same graph paper. The values of the variable are taken
as ordinates of the points plotted. From the direction and closeness of
the two curves we can infer whether the variables are related. If both
the curves move in the same direction (upward or downward),
correlation is said to be positive. If the curves are moving in the
opposite distinction correlation is said to be negative.
Algebraic methods:
Co-efficient of correlation
Under this method, we measure correlation by finding a value known as the
coefficient of correlation using an appropriate formula.
It shows the degree or extent of correlation between two variables.
Co-efficient of correlation is a pure number lying between -1 and +1. When the
correlation is negative, it lies between -1 and 0. When the correlation is positive, it
lies between 0 and 1. When the co-efficient of correlation is zero, it indicates that
there is no correlation between the variables. When the correlation coefficient is 1,
there is perfect correlation.
Co-efficient of correlation can be computed by applying the following methods:
Karl Pearson’s Co-efficient of correlation: This method is considered as
the best measure because it provides the knowledge of directions of changes
in data ie positive or negative and also shows the degree of correlation.
According to Karl pearson, the coefficient of correlation lies between two
limits (ie between -1 and +1).
Merits:
It shows the direction as positive or negative correlation.
It has a lot of algebraic properties and hence it can be used for further
algebraic treatment.
This method not only indicates the presence, or absence of correlation
but also determines the degree or extent to which the given two
variables are correlated.
It enables us in estimating the value of a dependent variable with
reference to a particular value of an independent variable through
regression equations.
Demerits:
It is comparatively difficult to calculate.
It is very much likely to be misinterpreted.
It is based on certain assumptions, which may not always hold good.
Compared to the other methods, it takes much time to arrive at the
results.
The result is very much affected by extreme values in the data sets.
Degree of Correlation
Perfect correlation: When the change in the two variables in such that
with an increase in the value of one, the value of the other increases in
a fixed proportion, correlation is said to be perfect. Co-efficient of
correlation is +1 for perfect positive correlation and it is -1 for perfect
negative correlation.
No Correlation: If changes in the value of one variable are not
associated with changes in the value of the other variable, there will be
no correlation. When there is no correlation the co efficient of
correlation is zero.
Limited degree of correlation: In between perfect correlation and no
correlation there may be limited degree of correlation. It may be
positive or negative. Limited degree of correlation may be termed as
high, moderate or low.
Merits:
It is easy to calculate.
It is simple to understand.
It can be applied to both quantitative and qualitative data.
Demerits:
Rank correlation coefficient is only approximate measure as the actual
values are not used.
It is not convenient when ‘n’ is large.
Further algebraic treatment is not possible.
Causation
Causation means one variable directly influences another. For eg: one
variable increases because the other decreases. Causation in relation
correlation means that one event directly causes another event to occur.
When a change in one variable directly leads to a change in another variable,
establishing a cause and effect rfelationship.
Uses of Correlation
It helps to study the association between two variables.
It measures degree of relation between two variables.
From the correlation coefficient, we can develop a measure called
probable error.
Correlation analysis helps to estimate the future values.
Correlation analysis is useful in understanding economic behavior.
It helps in finding out interrelated variables.
Limitations of Correlation
The correlation study indicates the existence of correlation, but it does
not indicate the cause and effect relationship.
Correlation coefficient assumes linear relationship regardless of the
assumption is correct or not.
Extreme items of variables are being unduly operated on correlation
coefficient.
Coefficient of determination
REGRESSION ANALYSIS
Regression analysis means the estimation or the prediction of the
unknown value of one variable from the known value of the other variable. It is a
statistical device used to study the relationship between two or more variables that
are related
In the words of M. M. Blair” Regression analysis is a mathematical measure
of the average relationship between two or more variables in terms of the
original units of the data”.
In regression analysis there are two types of variables. The variable whose value
Is influenced or is to be predicted is called dependent variable and the variable
which influences the values or is used for prediction, is called independent
variable.
Types of Regression
Regression analysis can be classified into:
Simple and Multiple Regressions: When there are only two variables
the regression equation obtained is called simple regression equation.
In multiple regression analysis there are more than two variables and
we try to find out the effect of two or more independent variables on
one dependent variable.
Linear and Non linear Regression: If all the values of both the given
two variables are plotted on a graph, we will get the curve of
regression, which can be formed to a straight line. The regression is
termed as linear if such curve of regression is formed to a straight line.
If the regression is not formed as a straight line, it is known as non-
linear correlation.
Total or Partial Regression: In total regression analysis, all the
influencing independent variables are considered while for estimating
the value of dependent variable. But in partial regression analysis,
influence of variables which are not relevant for a given purpose, are
excluded.
***************