Chapter 2-Simple Regression Model
Chapter 2-Simple Regression Model
SIMPLE REGRESSION
MODEL
PREPARED BY:
DR. SITI MULIANA SAMSI
2.1 Basic ideas of linear regression
The
most widely used method of obtaining these estimates is Ordinary Least Squares (OLS), which
has become so standard that its estimates are presented as a point of reference even when results from
other estimation techniques are used.
Ordinary Least Squares (OLS) is a regression estimation technique that calculates the β’s so as to
minimize the sum of the squared residuals.
There are at least five important reasons for using OLS to estimate regression models:
1. OLS is relatively easy to use.
2. The goal of minimizing is quite appropriate from a theoretical point of view.
3. OLS estimates have a number of useful characteristics.
4. The sum of the residuals is exactly zero.
5. OLS can be shown to be the “best” estimator possible under a set of specific assumptions
OLS estimates are simple enough that, if you had to, you could calculate them without
using a computer or a calculator (for a single independent- variable model). Indeed, in the
“dark ages” before computers and calculators, econometricians calculated OLS estimates
by hand!
was noted previously, the formulas for OLS estimation for a regression equation with
As
one independent variable are Equations 2.4 and 2.5:
If we undertake the calculations outlined in Table 2.1 and substitute them into Equations
2.4 and 2.5, we obtain these values:
What does = 103.4 mean? is the estimate of the constant or intercept term. In our
equation, it means that weight equals 103.4 pound when high equals zero. While it might
be tempting to say that the average weight of an adult is 103.4 pound=46.7 kg.
• Total, explained and residual sum of squares
Econometricians use the squared variations of Y around its mean as a measure of the
amount of variation to be explained by the regression. This computed quantity is usually
called the total sum of squares, or TSS, and is written as:
For Ordinary Least Squares, the total sum of squares has two components, variation that
can be explained by the regression and variation that cannot:
Figure 2.2 illustrates the decomposition of variance for a simple regression model. The
estimated values of Yi lie on the estimated regression line . The variation of Y around its
mean can be decomposed into two parts: the difference between the estimated value of and
the mean value of; and (2), the difference between the actual value of Y and the estimated
value of Y.
The first component of Equation 2.13 measures the amount of the squared deviation of Y i
from its mean that is explained by the regression line. This component of the total sum of
the squared deviations, called the explained sum of squares, or ESS, is attributable to the
fitted regression line.
The unexplained portion of TSS (that is, unexplained in an empirical sense by the
estimated regression equation), is called the residual sum of squares, or RSS.
We can see from Equation 2.13 that the smaller the RSS is relative to the TSS, the better
the estimated regression line fits the data. OLS is the estimating technique that minimizes
the RSS and therefore maximizes the ESS for a given TSS.
• Describing the Overall Fit of the Estimated Model
Good estimated regression equation will explain the variation of the dependent variable.
Looking at the overall fit of an estimated model is useful not only for evaluating the
quality of the regression, but also for comparing models that have different data sets or
combinations of independent variables.
The simplest commonly used measure of fit is R², or the coefficient of determination.
R² = The proportion of the variation in Y being explained by the variation in X.
R² is the ratio of the explained sum of squares to the total sum of squares:
Or
The higher R² is, the closer the estimated regression equation fits the sample data.
Measures of this type are called “goodness of fit” measures. R² measures the percentage of
the variation of Y around that is explained by the regression equation.
R² must lie in the interval 0 ≤ R² ≤ 1. A value of R² close to one shows an excellent overall
fit, whereas a value near zero shows a failure of the estimated regression equation to
explain the values of Yi better than could be explained by the sample mean .
Figure 2.3: X and Y are not related; in such a case, R² would be 0.
Figure 2.4: A set of data for X and Y that can be “explained” quite well with a regression
line (R² = .95).
Figure 2.5: A perfect fit: all the data points are on the regression line, and the resulting R²
is 1.
In cross-sectional data, it is often get low R² because the observations (say, countries)
differ in ways that are not easily quantified. In such a situation, an R² of 0.50 or 50% might
be considered a good fit, and researchers would tend to focus on identifying the variables
that have a substantive impact on the dependent variable, not on R².
R² for the Math S.A.T Example:
From the data given in Table 2-4, we obtain the following r² value for our math S.A.T.
score example:
Since r² can at most be 1, the computed r² is pretty high. In our math S.A.T. example, the
income variable explains about 79 percent of the variation in math S.A.T. scores. In this
case we can say that the sample regression in figure above gives an excellent fit.
Coefficient of correlation, r, as a measure of the strength of the linear relationship
between two variables Y and
X and show that r can be computed as follows:
In our example, math S.A.T. scores and annual family income are highly positively
correlated.
• Prediction with the simple regression
model
Dependent Variable: Y
Method: Least Squares
Date: 04/08/21 Tim e: 15:29
Sam ple: 1 10
Included observations : 10