Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
179 views

Chapter 2-Simple Regression Model

This document provides an overview of simple linear regression models. It discusses key concepts such as: - Dependent and independent variables, coefficients, and the error term in regression analysis. - Using the Ordinary Least Squares (OLS) method to estimate regression coefficients. - Decomposing the total sum of squares into explained and residual sum of squares. - Using the R-squared statistic to describe the overall fit of the estimated regression model and how well it explains the variation in the dependent variable.

Uploaded by

Muliana Samsi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
179 views

Chapter 2-Simple Regression Model

This document provides an overview of simple linear regression models. It discusses key concepts such as: - Dependent and independent variables, coefficients, and the error term in regression analysis. - Using the Ordinary Least Squares (OLS) method to estimate regression coefficients. - Decomposing the total sum of squares into explained and residual sum of squares. - Using the R-squared statistic to describe the overall fit of the estimated regression model and how well it explains the variation in the dependent variable.

Uploaded by

Muliana Samsi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

CHAPTER 2

SIMPLE REGRESSION
MODEL

PREPARED BY:
DR. SITI MULIANA SAMSI
2.1 Basic ideas of linear regression

 Econometricians use regression analysis to make quantitative estimates of economic


relationships that previously have been completely theoretical in nature.
 To predict the direction of the change, you need a knowledge of economic theory and the
general characteristics of the product in question.
 To predict the amount of the change, though, you need a sample of data, and you need a
way to estimate the relationship. The most frequently used method to estimate such a
relationship in econometrics is regression analysis
• Dependent variable, independent variable, coefficient, and error
term.
 Regression analysis is a statistical technique that attempts to “explain” movements in one
variable, the dependent variable, as a function of movements in a set of other variables,
called the independent (or explanatory) variables, through the quantification of one or
more equations. For example:
 Q is the dependent variable and P, PS, and Yd are the independent variables. Regression
analysis is a natural tool for economists because most economic propositions can be stated
in such equations. For example, the quantity demanded is a function of price, the prices of
substitutes, and income.
 If the price of a good increases by one unit, then the quantity demanded decreases on
average by a certain amount, depending on the price elasticity of demand.
Single-Equation Linear Models
 The simplest single-equation regression model is: Y=β0 + Β1X

Figure 1.1 Graphical Representation of the Coefficients of the Regression Line  


 The graph of the equation Y = β0 + β1X is linear with a constant slope equal to: β1 = ΔY/ΔX.
 Y=dependent variable, X=independent variable, β1=coefficients and slope coefficient, β0 =
constant or intercept
 The β’s are the coefficients that determine the coordinates of the straight line at any point.
β0 is the constant or intercept term; it indicates the value of Y when X equals zero. β 1 is the
slope coefficient, and it indicates the amount that Y will change when X increases by one
unit.
 The line in Figure 1.1 illustrates the relationship between the coefficients and the graphical
meaning of the regression equation. As can be seen from the diagram, Equation Y = β 0 +
β1X is indeed linear.
 The slope coefficient, β1, shows the response of Y to a one-unit increase in X. Much of the
emphasis in regression analysis is on slope coefficients such as β 1. In Figure 1.1 for
example, if X were to increase by one from X1 to X2 (ΔX), the value of Y in Equation 1.3
would increase from Y1 to Y2 (ΔY). For linear (i.e., straight-line) regression models, the
response in the predicted value of Y due to a change in X is constant and equal to the slope
coefficient β1:
• Reasons for having the random error term
 The Stochastic Error Term
 Besides the variation in the dependent variable (Y) that is caused by the independent
variable (X), there is almost always variation that comes from other sources as well. This
additional variation comes in part from omitted explanatory variables (e.g., X2 and X3).
 This variation probably comes from sources such as omitted influences, measurement
error, incorrect functional form, or purely random and totally unpredictable occurrences.
By random we mean something that has its value determined entirely by chance.
 Econometricians admit the existence unexplained variation (“error”) by explicitly
including a stochastic (or random) error term in their regression models. A stochastic error
term is a term that is added to a regression equation to introduce all of the variation in Y
that cannot be explained by the included X’s. The error term (sometimes called a
disturbance term) usually is referred to with the symbol epsilon (ξ), although other
symbols (like μ).
 The addition of a stochastic error term (ξ), to Equation Y = β 0 + β1X results in a typical
regression equation: Y = β0 + β1X+ ξ
2.2 Estimation of parameters by Ordinary Least Squares
(OLS)

 The
  most widely used method of obtaining these estimates is Ordinary Least Squares (OLS), which
has become so standard that its estimates are presented as a point of reference even when results from
other estimation techniques are used.
 Ordinary Least Squares (OLS) is a regression estimation technique that calculates the β’s so as to
minimize the sum of the squared residuals.
 There are at least five important reasons for using OLS to estimate regression models:
1. OLS is relatively easy to use.
2. The goal of minimizing is quite appropriate from a theoretical point of view.
3. OLS estimates have a number of useful characteristics.
4. The sum of the residuals is exactly zero.
5. OLS can be shown to be the “best” estimator possible under a set of specific assumptions
 OLS estimates are simple enough that, if you had to, you could calculate them without
using a computer or a calculator (for a single independent- variable model). Indeed, in the
“dark ages” before computers and calculators, econometricians calculated OLS estimates
by hand!
   was noted previously, the formulas for OLS estimation for a regression equation with
As
one independent variable are Equations 2.4 and 2.5:

 If we undertake the calculations outlined in Table 2.1 and substitute them into Equations
2.4 and 2.5, we obtain these values:

 What does = 103.4 mean? is the estimate of the constant or intercept term. In our
equation, it means that weight equals 103.4 pound when high equals zero. While it might
be tempting to say that the average weight of an adult is 103.4 pound=46.7 kg.
• Total, explained and residual sum of squares
 Econometricians use the squared variations of Y around its mean as a measure of the
amount of variation to be explained by the regression. This computed quantity is usually
called the total sum of squares, or TSS, and is written as:

 For Ordinary Least Squares, the total sum of squares has two components, variation that
can be explained by the regression and variation that cannot:

 This is usually called the decomposition of variance.


  

Figure 2.2 Decomposition of the Variance in Y

 Figure 2.2 illustrates the decomposition of variance for a simple regression model. The
estimated values of Yi lie on the estimated regression line . The variation of Y around its
mean can be decomposed into two parts: the difference between the estimated value of and
the mean value of; and (2), the difference between the actual value of Y and the estimated
value of Y.
 The first component of Equation 2.13 measures the amount of the squared deviation of Y i
from its mean that is explained by the regression line. This component of the total sum of
the squared deviations, called the explained sum of squares, or ESS, is attributable to the
fitted regression line.
 The unexplained portion of TSS (that is, unexplained in an empirical sense by the
estimated regression equation), is called the residual sum of squares, or RSS.
 We can see from Equation 2.13 that the smaller the RSS is relative to the TSS, the better
the estimated regression line fits the data. OLS is the estimating technique that minimizes
the RSS and therefore maximizes the ESS for a given TSS.
• Describing the Overall Fit of the Estimated Model

  Good estimated regression equation will explain the variation of the dependent variable.
 Looking at the overall fit of an estimated model is useful not only for evaluating the
quality of the regression, but also for comparing models that have different data sets or
combinations of independent variables.
 The simplest commonly used measure of fit is R², or the coefficient of determination.
 R² = The proportion of the variation in Y being explained by the variation in X.
  R² is the ratio of the explained sum of squares to the total sum of squares:

Or

 The higher R² is, the closer the estimated regression equation fits the sample data.
Measures of this type are called “goodness of fit” measures. R² measures the percentage of
the variation of Y around that is explained by the regression equation.
  R² must lie in the interval 0 ≤ R² ≤ 1. A value of R² close to one shows an excellent overall
fit, whereas a value near zero shows a failure of the estimated regression equation to
explain the values of Yi better than could be explained by the sample mean .
 Figure 2.3: X and Y are not related; in such a case, R² would be 0.

 Figure 2.4: A set of data for X and Y that can be “explained” quite well with a regression
line (R² = .95).
 Figure 2.5: A perfect fit: all the data points are on the regression line, and the resulting R²
is 1.
 In cross-sectional data, it is often get low R² because the observations (say, countries)
differ in ways that are not easily quantified. In such a situation, an R² of 0.50 or 50% might
be considered a good fit, and researchers would tend to focus on identifying the variables
that have a substantive impact on the dependent variable, not on R².
 R² for the Math S.A.T Example:
  

 From the data given in Table 2-4, we obtain the following r² value for our math S.A.T.
score example:

 Since r² can at most be 1, the computed r² is pretty high. In our math S.A.T. example, the
income variable explains about 79 percent of the variation in math S.A.T. scores. In this
case we can say that the sample regression in figure above gives an excellent fit.
  
Coefficient of correlation, r, as a measure of the strength of the linear relationship
between two variables Y and
 X and show that r can be computed as follows:

 Thus, for the math S.A.T. example,

 In our example, math S.A.T. scores and annual family income are highly positively
correlated.
• Prediction with the simple regression
model
Dependent Variable: Y
Method: Least Squares
Date: 04/08/21 Tim e: 15:29
Sam ple: 1 10
Included observations : 10

Variable Coefficient Std. Error t-Statis tic Prob.

C 432.4138 16.90607 25.57742 0.0000


X 0.001332 0.000245 5.435396 0.0006

R-squared 0.786914 Mean dependent var 507.0000


Adjusted R-squared 0.760278 S.D. dependent var 63.77913
S.E. of regression 31.22715 Akaike info criterion 9.897309
Sum squared resid 7801.078 Schwarz criterion 9.957826
Log likelihood -47.48655 Hannan-Quinn criter. 9.830922
F-s tatistic 29.54353 Durbin-Watson stat 0.842054
Prob(F-s tatistic) 0.000619
• Hypothesis testing (t-test)
 Hypothesis testing is used in a variety of settings. Example: The Food and Drug
Administration(FDA), tests new products before allowing their sale.
 If the sample of people exposed to the new product shows some side effect significantly
more frequently than would be expected to occur by chance, the FDA is likely to withhold
approval of marketing that product.
H0: β ≤ 0 (No side effect)
HA: β > 0 (Cause side effect)
 We cannot prove that a given theory is “correct” using hypothesis testing, we can reject a
given hypothesis with a certain level of confidence.
 The first step in hypothesis testing is to state the hypotheses to be tested. This should be
done before the equation is estimated.
 The null hypothesis is a statement of the values that the researcher does not expect. The
notation used is “H0:” followed by a statement of the range of values you do not expect.
For example, if you expect a positive coefficient, then you don’t expect a zero or negative
coefficient, and the null hypothesis is:
Null hypothesis H0: β ≤ 0 (the values you do not expect)
 The alternative hypothesis is a statement of the values that the researcher expects. The
notation used is “HA:” followed by a statement of the range of values you expect. For
example, if you expect a positive coefficient, then the alternative hypothesis is:
Alternative hypothesis HA: β > 0 (the values you expect)
 Thus, we can state the null and alternative hypotheses as follows:
H0: β ≤ 0
HA: β > 0
These hypotheses are for a one-sided test
 Another approach is to use a two-sided test (or a two-tailed test) in which the alternative
hypothesis has values on both sides of the null hypothesis. For a two-sided test around
zero, the null and alternative hypotheses are:
H0: β=0
These hypotheses are for two-sided test
HA: β≠0
Decision Rules of Hypothesis Testing

  A decision rule is a method of deciding whether to reject a null hypothesis.


 A decision rule should be formulated before regression estimates are obtained. The range
of possible values of is divided into two regions, an “acceptance” region and a “rejection”
region, where the terms are expressed relative to the null hypothesis.
 To define these regions, we must determine a critical value (or, for a two-tailed test, two
critical values) of . The critical value is a value that divides the “acceptance” region from
the rejection region when testing a null hypothesis.
 To use a decision rule, we need to select a critical value.
 Suppose that the critical value is 1.8, if the observed is greater than 1.8, we can reject the
null hypothesis that β is zero or negative. To see this, take a look at Figure 2.6. Any above
1.8 can be seen to fall into the rejection region, whereas any below 1.8 can be seen to fall
into the “acceptance” region.

You might also like