Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
10 views

Multiple Regression Analysis Further Issues

1. The document discusses various issues related to multiple regression analysis, including the effect of data scaling on regression statistics, standardized regression, and different functional forms such as logarithmic, quadratic, and interaction terms. 2. Data scaling will change the estimated regression parameters but not the statistical significance or other statistics like t-statistics, F-statistics, and R-squared. Standardized regression allows direct comparison of different regressors. 3. Functional forms like logarithmic can accommodate non-linear relationships, quadratic models allow for non-monotonic relationships, and interaction terms capture how the effect of one regressor depends on the level of another.

Uploaded by

Aqina Soares
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Multiple Regression Analysis Further Issues

1. The document discusses various issues related to multiple regression analysis, including the effect of data scaling on regression statistics, standardized regression, and different functional forms such as logarithmic, quadratic, and interaction terms. 2. Data scaling will change the estimated regression parameters but not the statistical significance or other statistics like t-statistics, F-statistics, and R-squared. Standardized regression allows direct comparison of different regressors. 3. Functional forms like logarithmic can accommodate non-linear relationships, quadratic models allow for non-monotonic relationships, and interaction terms capture how the effect of one regressor depends on the level of another.

Uploaded by

Aqina Soares
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

1

BASIC ECONOMETRICS
Elan Satriawan
M. Ryan Sanjaya

FEB UGM
10 May 2013
2

Multiple regression analysis: further


issues
The effect of data scaling on OLS statistics
• Sometimes it is more intuitive to rescale variable(s) used
in a regression
• E.g., using per capita GDP rather than GDP, wealth in million Rp
rather than wealth in Rp, etc.
• Data scaling will change the estimated parameters but not
the statistical significance
ŷ / m = b̂0 / m + (b̂1 / m)x1 + (b̂2 / m)x2
• Example: wealth in Indonesia
• IFLS4 (2007)
• Wealth (explicit): house, properties, vehicles, household
equipment, saving/deposit account, stocks, etc
• Rescale the dependent variable
3

Multiple regression analysis: further


issues
The effect of data scaling on OLS statistics
• First regression (wealth in Rp)
4

Multiple regression analysis: further


issues
The effect of data scaling on OLS statistics
• Second regression (wealth in million Rp)
5

Multiple regression analysis: further


issues
The effect of data scaling on OLS statistics
• Data scaling will change the estimated parameters but not
the statistical significance (rescale the dependent
variable)
• The estimated parameters in the second regression is equal to the
estimated parameters in the first regression divided by 1 million
• The same thing applied to the standard error and confidence
intervals
• F-statistics, t-statistics, p-value, and R2 all remain the same
• Similarly when we rescale the independent variable
• Define age_mo = age*12 (age in months)
• First regression: wealth on age, male, hieduc
• Second regression: wealth on age_mo, male, hieduc
6

Multiple regression analysis: further


issues
The effect of data scaling on OLS statistics
• First regression (wealth on age, male, hieduc)
7

Multiple regression analysis: further


issues
The effect of data scaling on OLS statistics
• Second regression (wealth on age_mo, male, hieduc)
8

Multiple regression analysis: further


issues
The effect of data scaling on OLS statistics
• Data scaling will change the estimated parameters but not
the statistical significance (rescale the independent
variable)
• The estimated parameters in the second regression is equal to the
estimated parameters except for the rescaled variable (age_mo)
• The estimated parameter for age_mo = estimated parameter for
age divided by 12
• The same thing applied to the standard error and confidence
intervals of age_mo
• F-statistics, t-statistics, p-value, and R2 all remain the same
9

Multiple regression analysis: further


issues
The effect of data scaling on OLS statistics
• Standardized regression (beta coefficients) (materials
taken from Gujarati 2004 )
• Instead of regressing y on x, we may want to regress yi*
on xi* where:
yi - y
yi* = sy: sample standard
sy deviation for y

xi - x sx: sample standard


xi* = deviation for x
sx
• It can be shown that yi* = xi* = 0 and sy* = sx* = 1
2 2
10

Multiple regression analysis: further


issues
Proof
assume that:
yi -10 y = 10
by definition yi* = i yi
4
s =4
1 ( yi -10 ) y 1 20
yi =
*
50
å 4 2 5

1 ì ( 20 -10 ) ( 5 -10 ) ( 5 -10 ) (8 -10 ) (10 -10) ü 3 5


yi* = í + + + + ... + ý
50 î 4 4 4 4 4 þ 4 8
1 {(20 -10) + (5 -10) + (5 -10) + (8 -10) + ... + (10 -10)} … …
yi* =
4 50 50 10
1 å (yi - y )
yi* = =0
sy n
11

Multiple regression analysis: further


issues
by definition: var(y) = s 2 =
å (yi - y ) 2
; yi* =
yi - y
; yi* = 0
y
n -1 sy

2
sy* =
å i i)
(y*
- y * 2

n -1
2
æ yi - y ö
å çè s - 0÷ø
y
sy* =
2
n -1
2
sy* =
(
å iy - y ) 2

sy2 (n -1)
12

Multiple regression analysis: further


issues
by definition: var(y) = sy2 =
å i
(y - y ) 2
; yi* =
yi - y
; yi* = 0
n -1 sy

2
sy* =
å i i)
(y*
- y * 2

n -1
2
æ yi - y ö
å çè s - 0÷ø
y
sy* =
2
n -1
2
sy* =
(
å iy - y ) 2
= sy2 (n -1)
sy2 (n -1)
sy2 (n -1)
2
sy* = =1
sy2 (n -1)
13

Multiple regression analysis: further


issues
yi* = b1* + b2* xi* + ui*
• It can be shown that β1* = 0, thus

yi* = b2* xi* + ui*


• For example:

Investment* = 0.94GDP*

• How would we interpret β2* = 0.94?


• if the (standardized) GDP increases by one standard deviation, on
average, the (standardized) investment increases by about 0.94
standard deviations
14

Multiple regression analysis: further


issues
• This procedure is useful because by standardizing the
variables we put all of the regressors on equal basis, so
we can compare them directly
• If the coefficient of a standardized regressor is larger than that of
another standardized regressor appearing in that model, then the
latter contributes more relatively to the explanation of the
regressand than the latter
• For example:

investment* = 0.8GDP* - 1.6interest*

• Here we can directly said that interest rate is twice more


important than GDP in explaining variations in investment
15

Multiple regression analysis: further


issues
From Gujarati (2004) page 62 equation 3.1.7: b̂1 = y - b̂2 x

intercept = (mean of y) – slope*(mean of x)

• If we standardized y and x, then we know that


• mean of y* = 0
• mean of x* = 0
• Then it follows right away that the intercept = 0
16

Multiple regression analysis: further


issues
More on functional form
• Using log: recall from week 2 slide 34:

• This approximation is getting less accurate if change in


log(y) is getting large
17

Multiple regression analysis: further


issues
More on functional form
log( ŷ) = b̂0 + b̂1x
• The more accurate way to describe how much change in
y due to change in x is:
%Dy = 100[exp(b̂1 ) -1]

• For example: log( ŷ) = 10 + 0.04x


• 100(exp(0.04) – 1) = 4.08
• 1 unit increase in x will increase y by 4.08% on average
18

Multiple regression analysis: further


issues
More on functional form
• Using log
• Quadratic model
y = b0 + b1x + b2x2 + u

• When using quadratic model, we cannot interpret the estimated


parameter for x and x2 partially
• That is, it does not make any sense to interpret b1 while holding x2
fixed
• Graph the quadratic model holding other regressors (if any) fixed
19

Multiple regression analysis: further


issues
• Quadratic model
wage = 10 + 2age – 0.03age2

Δwage/Δage ≈ 2 – 0.06age
At (local) maximum: Δwage/Δage = 0 when age ≈ 33.3 years old
20

Multiple regression analysis: further


issues
More on functional form
• Using log
• Quadratic model
• Interaction term
• Sometimes it is more intuitive to interact two (or more) independent
variables
• Example:

houseprice = b0 + b1sqrft + b2bdrms + b3sqrft*bdrms + u

• The partial effect of bedrooms is: b2 + b3sqrft


• If b3 > 0, it means that an additional bedroom is going to increase
houseprice for larger houses (higher sqrft)
21

Multiple regression analysis: further


issues
More on functional form
• Interaction term
• Sometimes it is more intuitive to interact two (or more) independent
variables
• Example:

houseprice = b0 + b1sqrft + b2bdrms + b3sqrft*bdrms + u

• The partial effect of bedrooms is: b2 + b3sqrft


• If b3 > 0, it means that an additional bedroom is going to increase
price for larger houses (higher sqrft)
22

Multiple regression analysis: further


issues
More on goodness of fit and selection of regressors
• Recall from our lecture in week 3 with regard to R2:
• It usually increases when another independent variable is added
• It doesn’t matter if the additional independent variable is not
relevant
• Adjusted R2
• Adjusted R2 imposes a penalty for adding additional regressors to a
model
adjR2 = 1 – (1 – R2)(n – 1)/(n – k – 1)
• While R2 always lies between 0 and 1, adjR2 can be a negative
number
23

Multiple regression analysis: further


issues
More on goodness of fit and selection of regressors
• Adjusted R2
• R2 and adjR2 cannot be used to compare two regressions that have
different dependent variable, even if the difference only lies in
different form of dependent variables
• First regression: y = b0 + b1x1 + b2x2 + u
• Second regression: ln(y) = b0 + b1x1 + b2x2 + u
24

Multiple regression analysis: further


issues
Prediction and residual analysis
• Confidence intervals for prediction
• Prediction (or predicted value of y) from an econometric model can
be confined within a prediction interval
• For example, when regressing ln(wealth) on hieduc, age, and male
we get
ln(wealth) = 16.8 + 1.28hieduc + 0.01age – 0.08male
• From here, we can predict the average wealth of a female at age
30 with higher education: ln(wealth) = 16.8 + 1.28 + 0.3 = 18.38 ➔
exp(18.38) = 96 million
• But, how accurate is our prediction?
25

Multiple regression analysis: further


issues
Prediction and residual analysis
• Confidence intervals for prediction
ln(wealth) = 16.85 + 1.28hieduc + 0.01age – 0.08male
• After some rounding, the average wealth of a female at age 30 with
higher education: ln(wealth) = 18.21
• In order to obtain the prediction interval for this particular female we
need to:
1. Construct a new set of variables: hieduc0 = hieduc – 1, age0 =
age – 30, male0 = male – 0
2. Regress ln(wealth) on hieduc0, age0 and male0
ln(wealth) = 18.21 + 1.28hieduc + 0.01age – 0.08male
standard errors (0.05) (0.05) (0.00) (0.03)
As you can see, the only difference is the intercept
26

Multiple regression analysis: further


issues
Prediction and residual analysis
• Confidence intervals for prediction
• In order to obtain the prediction interval for this particular female we
need to:
1. hieduc0 = hieduc – 1, age0 = age – 30, male0 = male – 0
2. Regress ln(wealth) on hieduc0, age0 and male0
ln(wealth) = 18.21 + 1.28hieduc + 0.01age – 0.08male
standard errors (0.05) (0.05) (0.00) (0.03)
3. Compute CI: intercept ± c.se(intercept) → [cf week 4 slide 44]
For 10,423 observations and 5% significance, c ≈ 1.96
18.21 ± 1.96(0.05) → predicted ln(wealth) for a female age 30 with
higher education is between 18.11 and 18.31
27

Multiple regression analysis: further


issues
Prediction and residual analysis
• Confidence intervals for prediction
• Predicting y when log(y) is the dependent variable
• We have seen examples when the dependent variable is in log
form such as ln(wealth)
• How do we interpret the predicted value of ln(y)?
• One cannot simply exponentiating the predicted ln(y)
• Steps:
1. Obtain fitted values of ln(y) from the regression (let’s caled it
ly_hat)
2. Create m = exp(ly_hat)
3. Regress wealth on m without a constant
4. Create y_hat = a0*exp(ly_hat), where a0 is the estimated
parameter from step 3 above

You might also like