Simple Regression Model: Conference Paper
Simple Regression Model: Conference Paper
net/publication/329611627
CITATIONS READS
0 576
2 authors:
Some of the authors of this publication are also working on these related projects:
The story of my preprint (future article) tittled “Fermat Last Theorem Revisited” View project
All content following this page was uploaded by Mercedes Orús-Lacort on 13 December 2018.
Occasionally, we have two quantitative variables that may be related, and what we
intend to study is: can we predict the value of one of them from the known values of the
other?.
Draw a graph where appear each variable data, this graph is called "Scatter
plot".
Calculate a formula which will allow us to predict the value of one of these
variables from the another, this formula "Regression line" is called.
We studied if we can consider the regression line as valid. For do it, we resolve
hypothesis test, and we calculate a ratio called "Adjustment coefficient of
goodness" (or also called R-squared, or coefficient of determination).
The values of the two variables that we are studying are represented in this diagram.
And we may find with situations like that you will see below:
First situation:
- The points are close together: This means that there is a strong relationship between
the two variables.
- Also you may observe they are right-oriented: This means that both variables
are related directly proportional, i.e. when it increases spending on Advertising,
also increase the Benefits.
Second situation:
- The points are not very close together: This means that there is not a strong relation
between the two variables, but if we calculate the regression line, this will not adjust
very well.
- Also you may observe the right-oriented: This means that both variables are related
directly proportional, i.e. when it increases spending on Advertising, also increase the
Benefits.
Third situation:
- The points are very dispersed: This means that there is no relation between the two
variables, and that it wouldn't make any sense calculate a regression model.
Fourth situation:
- The points are close together: This means that there is a strong relationship between
the two variables.
- Also you may observe they are left-oriented: This means that both variables are related
inversely proportional, i.e. when it increases spending on Advertising, then decrease the
Benefits.
If we have data from two random variables that we think that they may be related, the
mode to confirm if that relationship exists or not, is to calculate the correlation
coefficient of Pearson rxy. The value of this coefficient is always between - 1 and 1.
1
n1
(xi x)(yi y)
1 1
n1
(xi x) n 1 (yi y)
2 2
1
n1
(xi x)(yi y) (xi x)(yi y)
1
n1
(x i x)2 (yi y)2 (x i x)2 (yi y)2
xiyi y xi x yi nxy
xi2 2x xi nx yi2 2y yi ny
2 2
First situation:
In this case, rxy will have positive sign, and its value would be close to 1, e.g. rxy = 0976.
Second situation:
In this case, rxy will have positive sign, and its value would be not more close to 1, e.g.
rxy = 0,676.
Third situation:
In this case, rxy will have positive or negative sign and its value would be more close to
0 than 1, e.g. rxy = 0.215 or rxy = - 0.215.
Fourth situation:
In this case, rxy will have a negative sign, and its value would be close to - 1, for
example rxy = - 0,915.
Using the regression line we can predict the value of one of the variables from the
other.
To the variable which we are going to predict its value (say it is Y), is called dependent
variable, and the other variable (say it is X) is called independent variable.
We intend, therefore, to find a formula of the type Y = a + b·X that will allow us to
predict the value of Y from the value of the X, so that, it fits the maximum
possible cloud dispersion plot points.
For example, and according to the 4 situations we have seen above, we could
have:
First situation:
Second situation:
Third situation:
Fourth situation:
Calculation of the values of "a" and "b"
1
SXY n 1 (xi x)(yi y) (x x)(y y)
i i
b 2
SX 1
i(x x) 2 (x x)
i
2
n1
x y y x x y nxy
i i i i
x 2x x nx
i
2
i
2
Once calculated the "b", "a" called y-intercept, it’s calculated as follows:
a y bx
To know if we can give valid regression model, we must resolve the following
hypotheses test:
Ho: β = 0
Ha: β ≠ 0
To resolve this test, we calculate the statistic test which is a Student's t with
n - 2 degrees of freedom, by the following formula:
b b
t
Sb 1 n
(y a bxi )2
n 2 i1 i
n
(x
i 1
i x)2
where :
Let us note, that if give us the total values of the sums, and I do not know the values of
each value of the variable X and the Y, then, we will calculate the standard error as
shown below:
1 n
(y a bxi )2
n 2 i1 i
Sb
n
(x
i 1
i x) 2
1 n 2 n n n n
n 2 i1
y i n·a 2
b 2
x i
2
2a y i 2b x y
i i 2ab xi
i 1 i 1 i 1 i 1
x i
2
2x x i nx
2
Calculate P Value:
P Value = 2·P (tn-2 > |t test|)
Therefore:
Another way to see if the model "fit well or not", is by calculating the coefficient R
square, or also called coefficient of determination or coefficient of goodness of fit. To
calculate it, we use the following formula:
R2 = rxy2