Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Multiple Regression Analysis Further Issues

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

1

BASIC ECONOMETRICS
Elan Satriawan
M. Ryan Sanjaya

FEB UGM
10 May 2013
2

Multiple regression analysis: further


issues
The effect of data scaling on OLS statistics
• Sometimes it is more intuitive to rescale variable(s) used
in a regression
• E.g., using per capita GDP rather than GDP, wealth in million Rp
rather than wealth in Rp, etc.
• Data scaling will change the estimated parameters but not
the statistical significance
ŷ / m = b̂0 / m + (b̂1 / m)x1 + (b̂2 / m)x2
• Example: wealth in Indonesia
• IFLS4 (2007)
• Wealth (explicit): house, properties, vehicles, household
equipment, saving/deposit account, stocks, etc
• Rescale the dependent variable
3

Multiple regression analysis: further


issues
The effect of data scaling on OLS statistics
• First regression (wealth in Rp)
4

Multiple regression analysis: further


issues
The effect of data scaling on OLS statistics
• Second regression (wealth in million Rp)
5

Multiple regression analysis: further


issues
The effect of data scaling on OLS statistics
• Data scaling will change the estimated parameters but not
the statistical significance (rescale the dependent
variable)
• The estimated parameters in the second regression is equal to the
estimated parameters in the first regression divided by 1 million
• The same thing applied to the standard error and confidence
intervals
• F-statistics, t-statistics, p-value, and R2 all remain the same
• Similarly when we rescale the independent variable
• Define age_mo = age*12 (age in months)
• First regression: wealth on age, male, hieduc
• Second regression: wealth on age_mo, male, hieduc
6

Multiple regression analysis: further


issues
The effect of data scaling on OLS statistics
• First regression (wealth on age, male, hieduc)
7

Multiple regression analysis: further


issues
The effect of data scaling on OLS statistics
• Second regression (wealth on age_mo, male, hieduc)
8

Multiple regression analysis: further


issues
The effect of data scaling on OLS statistics
• Data scaling will change the estimated parameters but not
the statistical significance (rescale the independent
variable)
• The estimated parameters in the second regression is equal to the
estimated parameters except for the rescaled variable (age_mo)
• The estimated parameter for age_mo = estimated parameter for
age divided by 12
• The same thing applied to the standard error and confidence
intervals of age_mo
• F-statistics, t-statistics, p-value, and R2 all remain the same
9

Multiple regression analysis: further


issues
The effect of data scaling on OLS statistics
• Standardized regression (beta coefficients) (materials
taken from Gujarati 2004 )
• Instead of regressing y on x, we may want to regress yi*
on xi* where:
yi - y
yi* = sy: sample standard
sy deviation for y

xi - x sx: sample standard


xi* = deviation for x
sx
• It can be shown that yi* = xi* = 0 and sy* = sx* = 1
2 2
10

Multiple regression analysis: further


issues
Proof
assume that:
yi -10 y = 10
by definition yi* = i yi
4
s =4
1 ( yi -10 ) y 1 20
yi =
*
50
å 4 2 5

1 ì ( 20 -10 ) ( 5 -10 ) ( 5 -10 ) (8 -10 ) (10 -10) ü 3 5


yi* = í + + + + ... + ý
50 î 4 4 4 4 4 þ 4 8
1 {(20 -10) + (5 -10) + (5 -10) + (8 -10) + ... + (10 -10)} … …
yi* =
4 50 50 10
1 å (yi - y )
yi* = =0
sy n
11

Multiple regression analysis: further


issues
by definition: var(y) = s 2 =
å (yi - y ) 2
; yi* =
yi - y
; yi* = 0
y
n -1 sy

2
sy* =
å i i)
(y*
- y * 2

n -1
2
æ yi - y ö
å çè s - 0÷ø
y
sy* =
2
n -1
2
sy* =
(
å iy - y ) 2

sy2 (n -1)
12

Multiple regression analysis: further


issues
by definition: var(y) = sy2 =
å i
(y - y ) 2
; yi* =
yi - y
; yi* = 0
n -1 sy

2
sy* =
å i i)
(y*
- y * 2

n -1
2
æ yi - y ö
å çè s - 0÷ø
y
sy* =
2
n -1
2
sy* =
(
å iy - y ) 2
= sy2 (n -1)
sy2 (n -1)
sy2 (n -1)
2
sy* = =1
sy2 (n -1)
13

Multiple regression analysis: further


issues
yi* = b1* + b2* xi* + ui*
• It can be shown that β1* = 0, thus

yi* = b2* xi* + ui*


• For example:

Investment* = 0.94GDP*

• How would we interpret β2* = 0.94?


• if the (standardized) GDP increases by one standard deviation, on
average, the (standardized) investment increases by about 0.94
standard deviations
14

Multiple regression analysis: further


issues
• This procedure is useful because by standardizing the
variables we put all of the regressors on equal basis, so
we can compare them directly
• If the coefficient of a standardized regressor is larger than that of
another standardized regressor appearing in that model, then the
latter contributes more relatively to the explanation of the
regressand than the latter
• For example:

investment* = 0.8GDP* - 1.6interest*

• Here we can directly said that interest rate is twice more


important than GDP in explaining variations in investment
15

Multiple regression analysis: further


issues
From Gujarati (2004) page 62 equation 3.1.7: b̂1 = y - b̂2 x

intercept = (mean of y) – slope*(mean of x)

• If we standardized y and x, then we know that


• mean of y* = 0
• mean of x* = 0
• Then it follows right away that the intercept = 0
16

Multiple regression analysis: further


issues
More on functional form
• Using log: recall from week 2 slide 34:

• This approximation is getting less accurate if change in


log(y) is getting large
17

Multiple regression analysis: further


issues
More on functional form
log( ŷ) = b̂0 + b̂1x
• The more accurate way to describe how much change in
y due to change in x is:
%Dy = 100[exp(b̂1 ) -1]

• For example: log( ŷ) = 10 + 0.04x


• 100(exp(0.04) – 1) = 4.08
• 1 unit increase in x will increase y by 4.08% on average
18

Multiple regression analysis: further


issues
More on functional form
• Using log
• Quadratic model
y = b0 + b1x + b2x2 + u

• When using quadratic model, we cannot interpret the estimated


parameter for x and x2 partially
• That is, it does not make any sense to interpret b1 while holding x2
fixed
• Graph the quadratic model holding other regressors (if any) fixed
19

Multiple regression analysis: further


issues
• Quadratic model
wage = 10 + 2age – 0.03age2

Δwage/Δage ≈ 2 – 0.06age
At (local) maximum: Δwage/Δage = 0 when age ≈ 33.3 years old
20

Multiple regression analysis: further


issues
More on functional form
• Using log
• Quadratic model
• Interaction term
• Sometimes it is more intuitive to interact two (or more) independent
variables
• Example:

houseprice = b0 + b1sqrft + b2bdrms + b3sqrft*bdrms + u

• The partial effect of bedrooms is: b2 + b3sqrft


• If b3 > 0, it means that an additional bedroom is going to increase
houseprice for larger houses (higher sqrft)
21

Multiple regression analysis: further


issues
More on functional form
• Interaction term
• Sometimes it is more intuitive to interact two (or more) independent
variables
• Example:

houseprice = b0 + b1sqrft + b2bdrms + b3sqrft*bdrms + u

• The partial effect of bedrooms is: b2 + b3sqrft


• If b3 > 0, it means that an additional bedroom is going to increase
price for larger houses (higher sqrft)
22

Multiple regression analysis: further


issues
More on goodness of fit and selection of regressors
• Recall from our lecture in week 3 with regard to R2:
• It usually increases when another independent variable is added
• It doesn’t matter if the additional independent variable is not
relevant
• Adjusted R2
• Adjusted R2 imposes a penalty for adding additional regressors to a
model
adjR2 = 1 – (1 – R2)(n – 1)/(n – k – 1)
• While R2 always lies between 0 and 1, adjR2 can be a negative
number
23

Multiple regression analysis: further


issues
More on goodness of fit and selection of regressors
• Adjusted R2
• R2 and adjR2 cannot be used to compare two regressions that have
different dependent variable, even if the difference only lies in
different form of dependent variables
• First regression: y = b0 + b1x1 + b2x2 + u
• Second regression: ln(y) = b0 + b1x1 + b2x2 + u
24

Multiple regression analysis: further


issues
Prediction and residual analysis
• Confidence intervals for prediction
• Prediction (or predicted value of y) from an econometric model can
be confined within a prediction interval
• For example, when regressing ln(wealth) on hieduc, age, and male
we get
ln(wealth) = 16.8 + 1.28hieduc + 0.01age – 0.08male
• From here, we can predict the average wealth of a female at age
30 with higher education: ln(wealth) = 16.8 + 1.28 + 0.3 = 18.38 ➔
exp(18.38) = 96 million
• But, how accurate is our prediction?
25

Multiple regression analysis: further


issues
Prediction and residual analysis
• Confidence intervals for prediction
ln(wealth) = 16.85 + 1.28hieduc + 0.01age – 0.08male
• After some rounding, the average wealth of a female at age 30 with
higher education: ln(wealth) = 18.21
• In order to obtain the prediction interval for this particular female we
need to:
1. Construct a new set of variables: hieduc0 = hieduc – 1, age0 =
age – 30, male0 = male – 0
2. Regress ln(wealth) on hieduc0, age0 and male0
ln(wealth) = 18.21 + 1.28hieduc + 0.01age – 0.08male
standard errors (0.05) (0.05) (0.00) (0.03)
As you can see, the only difference is the intercept
26

Multiple regression analysis: further


issues
Prediction and residual analysis
• Confidence intervals for prediction
• In order to obtain the prediction interval for this particular female we
need to:
1. hieduc0 = hieduc – 1, age0 = age – 30, male0 = male – 0
2. Regress ln(wealth) on hieduc0, age0 and male0
ln(wealth) = 18.21 + 1.28hieduc + 0.01age – 0.08male
standard errors (0.05) (0.05) (0.00) (0.03)
3. Compute CI: intercept ± c.se(intercept) → [cf week 4 slide 44]
For 10,423 observations and 5% significance, c ≈ 1.96
18.21 ± 1.96(0.05) → predicted ln(wealth) for a female age 30 with
higher education is between 18.11 and 18.31
27

Multiple regression analysis: further


issues
Prediction and residual analysis
• Confidence intervals for prediction
• Predicting y when log(y) is the dependent variable
• We have seen examples when the dependent variable is in log
form such as ln(wealth)
• How do we interpret the predicted value of ln(y)?
• One cannot simply exponentiating the predicted ln(y)
• Steps:
1. Obtain fitted values of ln(y) from the regression (let’s caled it
ly_hat)
2. Create m = exp(ly_hat)
3. Regress wealth on m without a constant
4. Create y_hat = a0*exp(ly_hat), where a0 is the estimated
parameter from step 3 above

You might also like