Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Correlation and Regression

The document discusses correlation and regression as statistical methods to quantify relationships between variables. Correlation measures the strength and direction of the association, while regression provides a mathematical model to predict the dependent variable based on the independent variable. It also outlines different types of regression models, including linear, non-linear, and multiple regression, along with stepwise regression modeling techniques.

Uploaded by

Shuvo Chakma
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Correlation and Regression

The document discusses correlation and regression as statistical methods to quantify relationships between variables. Correlation measures the strength and direction of the association, while regression provides a mathematical model to predict the dependent variable based on the independent variable. It also outlines different types of regression models, including linear, non-linear, and multiple regression, along with stepwise regression modeling techniques.

Uploaded by

Shuvo Chakma
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 15

DSM 1203: Applications of Statistical Methods in Disaster Management

Chapter
Correlation and Regression 12
What are Correlation and Regression?

Correlation and regression are statistical measurements that are used to give a
relationship between two variables. For example, suppose a person is driving an
expensive car then it is assumed that she must be financially well. To numerically
quantify this relationship, correlation and regression are used.
Correlation
What is a correlation?
A correlation reflects the strength and/or direction of the association between two or
more variables. Correlation can be defined as a measurement that is used to quantify the
relationship between variables.

 If an increase (or decrease) in one variable causes a corresponding increase (or


decrease) in another then the two variables are said to be directly correlated.

 Similarly, if an increase in one causes a decrease in another or vice versa, then


the variables are said to be indirectly correlated.

 If a change in an independent variable does not cause a change in the dependent


variable, then they are uncorrelated.

Thus, correlation can be positive (direct correlation), negative (indirect


correlation), or zero. This relationship is given by the correlation coefficient.
1. A positive correlation means that both variables change in the same direction.
Example: As height increases, weight also increases.
2. A negative correlation means that the variables change in opposite directions.
Example: As coffee consumption increases, tiredness decreases.
3. A zero correlation means there’s no relationship between the variables.
Example: Coffee consumption is not correlated with height
Regression
Definition of Simple Regression

A regression model is a mathematical equation that describes the relationship between two or
more variables. A simple regression model includes only two variables: one independent and one
dependent. The dependent variable is the one being explained, and the independent variable is the
one used to explain the variation in the dependent variable.

Definition of Linear Regression

A (simple) regression model that gives a straight-line relationship between two variables is called
a linear regression model.

The two diagrams in Figure 13.1 show a linear and a nonlinear relationship between the
dependent variable food expenditure and the independent variable income. A linear relationship
between income and food expenditure, shown in Figure 13.1a, indicates that as income increases,
the food expenditure always increases at a constant rate. A nonlinear relationship between income
and food expenditure, as depicted in Figure 13.1b, shows that as income increases, the food
expenditure increases, although, after a point, the rate of increase in food expenditure is lower for
every subsequent increase in income.

Figure 13.1 Relationship between food expenditure and income. (a) Linear
relationship. (b) Nonlinear relationship.

2
Correlation and Regression Analysis

Both correlation and regression analysis are done to quantify the strength of the
relationship between two variables by using numbers. Graphically, correlation and
regression analysis can be visualized using scatter plots.

Correlation analysis is done so as to determine whether there is a relationship between


the variables that are being tested. Furthermore, a correlation coefficient such as
Pearson's correlation coefficient is used to give a signed numeric value that depicts the
strength as well as the direction of the correlation. The scatter plot gives the correlation
between two variables x and y for individual data points as shown below.

3
Regression analysis is used to determine the relationship between two variables such that
the value of the unknown variable can be estimated using the knowledge of the known
variables. The goal of linear regression is to find the best-fitted line through the data
points. For two variables, x, and y, the regression analysis can be visualized as follows:

Correlation and Regression Formula

The best way to conduct correlation and regression analysis is by using Pearson's
correlation coefficient and by adopting the method of least squares respectively. The
correlation and regression formula is given below:

Pearson's Correlation Coefficient:

4
Difference between Correlation and Regression

Correlation and regression are both used as statistical measurements to get a good understanding
of the relationship between variables. If the correlation coefficient is negative (or positive) then
the slope of the regression line will also be negative (or positive). The table given below
highlights the key difference between correlation and regression.
Correlation Regression
Correlation is used to determine whether Regression is used to numerically describe how a
variables are related or not. dependent variable changes with a change in an
independent variable
Correlation tries to establish a linear It finds the best-fitted regression line to estimate an
relationship between variables. unknown variable on the basis of the known
variable.
The variables can be used The variables cannot be interchanged.
interchangeably
Correlation uses a signed numerical Regression is used to show the impact of a unit
value to estimate the strength of the change in the independent variable on the dependent
relationship between the variables. variable.
The Pearson's coefficient is the best The least-squares method is the best technique to
measure of correlation. determine the regression line.

What is regression model?

A regression model determines a relationship between an independent variable and a


dependent variable, by providing a function. Formulating a regression analysis helps you
predict the effects of the independent variable on the dependent one.

Example: We can say that age and height can be described using a linear regression
model. Since a person’s height increases as age increases, they have a linear
relationship.

Regression models are commonly used as statistical proof of claims regarding everyday
facts.
What are the different types of regression models?
There are three different types of regression models:

1. Linear
2. Non-linear
3. Multiple
Let’s look at them in detail:

5
Linear regression model
A linear regression model is used to depict a relationship between variables that are
proportional to each other. Meaning, that the dependent variable increases/decreases
with the independent variable.

In the graphical representation, it has a straight linear line plotted between the variables.
Even if the points are not exactly in a straight line (which is always the case) we can still
see a pattern and make sense of it.

For example, as the age of a person increases, the level of glucose in their body
increases as well.

Non-linear regression model


In the non-linear regression model, the graph doesn’t show a linear progression.
Depending on how the response variable reacts to the input variable, the line will rise or
fall showing the height or depth of the effect of the response variable.

To know that a non-linear regression model is the best fit for your scenario, make sure
you look into your variables and their patterns. If you see that the response variable is
showing not-so-constant output to the input variable, you can choose to use a non-linear
model for your problem.

For example, a patient’s response to treatment can be good or bad depending on their
body’s tendency and willpower.

Multiple regression model


A multiple regression model is used when there is more than one independent variable
affecting a dependent variable. While predicting the outcome variable, it is important to
measure how each of the independent variables moves in their environment and how
their changes will affect the output or target variable.

For example, the chances of a student failing their test can be dependent on various
input variables like hard work, family issues, health issues, etc.
What is stepwise regression modeling?

Unlike the above-mentioned regression model types, stepwise regression modeling is


more of a technique used when various input variables are affecting one output variable.
The analyst will automatically proceed to measure the variable that is directly correlated
input variable and build a model out of it. The rest of the variables come into the picture
when he decides to perfect the model.

6
The analyst may add the remaining inputs one after the other based on their significance
and the extent to which it affects the target variable.

For example, vegetable prices have increased in a certain areas. The reason behind the
event can be anything from natural calamities to transport and supply chain
management. When an analyst decides to put it out on a graph, he will pick up the most
obvious reason, heavy rainfall in the agricultural regions.

Once the model is built, he can then add the rest of the affecting input variables into the
picture based on their occurrence and significance.

7
EXAMPLE 13–1

Find the least squares regression line for the data on incomes and food
expenditures on the seven households given in Table 13.1. Use income as an
independent variable and food expenditure as a dependent variable.

8
9
N.B. SS is the Sum of squre

10
Solution
(a) Based on theory and intuition, we expect the insurance premium to
depend on driving experience. Consequently, the insurance premium is a
dependent variable and driving experience is an independent variable in the
regression model. A new driver is considered a high risk by the insurance
companies, and he or she has to pay a higher premium for auto insurance. On
average, the insurance premium is expected to decrease with an increase in
the years of driving experience. Therefore, we expect a negative relationship
between these two variables. In other words, both the population correlation
coefficient and the population regression slope B are expected to be
negative.

(b) Table 13.5 shows the calculation of x, y, xy, x2, and y2.

11
12
13
14
15

You might also like