Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
26 views

+part 04 - AMEFA - 2024 - Introduction and Repetition

Jonkoping University Analytical Methods for Economic and Financial Analysis Part 4

Uploaded by

Abhishekh Pandey
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

+part 04 - AMEFA - 2024 - Introduction and Repetition

Jonkoping University Analytical Methods for Economic and Financial Analysis Part 4

Uploaded by

Abhishekh Pandey
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

AMEFA

Pär Sjölander
+Part 04 - AMEFA - 2024 - Introduction and Repetition.pptx
Simple Linear Regression vs. Multiple Linear Regression

Simple Linear OLS regression: Multiple Linear OLS regression:


BodyLength𝑖𝑖 = β0 + β1 × Weight𝑖𝑖 + ε𝑖𝑖 BodyLength𝑖𝑖 = β0 + β1 × Weight𝑖𝑖 + β2 × TailLength𝑖𝑖 + ε𝑖𝑖
• The purpose in both simple and multiple linear regression is to estimate the parameters
(β0 , β1 , β2 , β3 , …, etc.) that minimize the residual sum of squares.
• In simple regression, this is achieved by fitting the best straight line through the data.
• For multiple regression, we aim to find the best-fitting hyperplane in a multidimensional space,
which becomes a hyperplane when more than two predictor variables are involved.
Multiple Linear Regression

Thus, Multiple Linear OLS regression is rather similar to Single Linear OLS regression,
but with more variables – more assumptions need to be satisfied too.
Multiple Linear Regression
• Simple Linear Regression can be extended (generalized) to models with more than
one independent variable, for example, a model with p variables, like:

𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋1𝑖𝑖 + 𝛽𝛽2 𝑋𝑋2𝑖𝑖 + ⋯ + 𝛽𝛽𝑝𝑝 𝑋𝑋𝑝𝑝𝑖𝑖 + 𝑢𝑢𝑖𝑖

• The formulae for OLS estimators of multiple linear regression become complicated
when more than 2 independent variables are involved. One needs to use matrix
algebra. However, instead, we use statistical software packages to solve this problem
in the course.

• A slope 𝛽𝛽𝑘𝑘 , for 𝑘𝑘 = 1, … , 𝑝𝑝, is interpreted as the amount of change in the dependent
variable 𝑌𝑌, per unit change of the independent variable 𝑋𝑋𝑘𝑘 , holding the values of all
other independent variables fixed/constant (ceteris paribus).
OLS Classical Linear Regression Model (CLRM) Assumptions
Classical Linear Regression Model (CLRM) assumptions must be satisfied (except 10) for an estimator to be considered the Best Linear
Unbiased Estimator (BLUE). These assumptions are relevant to the Gauss-Markov theorem, which proves the BLUE property.
1) Linearity (linear model in parameters, not necessarily in variables since e.g. log transformations can linearize variables).
2) Independent 𝑋𝑋’s from the error terms, i.e., 𝑐𝑐𝑐𝑐𝑐𝑐 𝑋𝑋𝑘𝑘𝑘𝑘 , 𝑢𝑢𝑗𝑗 = 0, for all 𝑘𝑘 = 1, … , 𝑝𝑝, and 𝑖𝑖, 𝑗𝑗 = 1, … , 𝑛𝑛.
3) Zero mean error terms, i.e., 𝐸𝐸 𝑢𝑢𝑖𝑖 = 0, for all 𝑖𝑖 = 1, … , 𝑛𝑛.
4) Homoscedasticity (constant variance of the error terms), i.e., 𝑣𝑣𝑣𝑣𝑣𝑣 𝑢𝑢𝑖𝑖 = 𝜎𝜎 2 , for all 𝑖𝑖 = 1, … , 𝑛𝑛.
5) No autocorrelation between the error terms, i.e., 𝑐𝑐𝑐𝑐𝑐𝑐 𝑢𝑢𝑖𝑖 , 𝑢𝑢𝑗𝑗 = 0, for all 𝑖𝑖, 𝑗𝑗 = 1, … , 𝑛𝑛, where 𝑖𝑖 ≠ 𝑗𝑗.
6) The number of observations must be larger than the number of parameters to be estimated (slopes and intercept),
i.e., 𝑛𝑛 > 𝑝𝑝 + 1.
7) There must be variation in the values of 𝑋𝑋’s (other than intercept), i.e., there is no 𝑘𝑘, such that 𝑋𝑋𝑘𝑘𝑘𝑘 = 𝑐𝑐, for all 𝑖𝑖 = 1, … , 𝑛𝑛.
8) No perfect multicollinearity between 𝑋𝑋’s.
9) No specification bias, i.e., the model should be specified correctly (e.g. not overspecified/underspecified).
10) Assumption of error normality, i.e., 𝑢𝑢𝑖𝑖 ~𝑁𝑁 0, 𝜎𝜎 2 . Normality is not required for OLS to be BLUE (minimum variance
coefficient estimates for linear unbiased models). Error normality is necessary for small sample hypothesis testing (e.g.
hypothesis tests as t-tests) but less important for large samples due to the Central Limit Theorem (CLT).
(For MLE, however, normality is necessary for defining the likelihood function). More clarifications on the next slide 
While the normal distribution of errors is a common assumption, it is not necessary for OLS to provide BLUE (Best Linear Unbiased Estimator) estimates. In simple terms, even if the errors are not normally distributed, OLS can still give us good estimates of our
parameters. However, this assumption becomes important when we want to make precise predictions about how often our estimates might be off (hypothesis testing). When we have a lot of data, the Central Limit Theorem helps us because it says that our
estimates will tend to follow a normal distribution even if the errors do not. But for small samples, if we want to use common statistical tests, we rely on this normality assumption. In contrast, for methods like Maximum Likelihood Estimation (MLE), which is
a different estimation approach, assuming normality is crucial for the technique to work properly. For MLE: If the assumption of error normality is violated, it does not necessarily mean that MLE estimators lose their consistency, but they may not be efficient
or unbiased, particularly in small samples. In large samples, however, MLE estimators often retain their consistency and asymptotic efficiency, even if the error terms are not normally distributed.
What is BLUE? * Note: To be correct, though not crucial to memomrize, the Gauss-Markov theorem asserts in a linear regression context: if errors exhibit zero
mean, constant variance (homoscedasticity), and no correlation, then OLS estimators are distinguished as the Best Linear Unbiased Estimators (BLUE).
However, roughly speaking we say that the CLRM assumptions (except error normality) should hold – then we can use OLS – and it will be BLUE.

The term "BLUE" in econometrics stands for "Best Linear Unbiased Estimator." The estimator (that
gives the coef.) is:
- Best: The estimator has the smallest variance among all linear unbiased estimators.
- Linear: A linear function of the observed data.
- Unbiased: Has an expected value equal to the true parameter value (not systematically over or
underestimating it).
Under the Classical Linear Regression Model (CLRM) assumptions*, the Ordinary Least Squares (OLS)
estimators are the Best Linear Unbiased Estimators (BLUE). This implies that, within the class of all
linear unbiased estimators, OLS estimators have the minimum variance, giving them the most efficient
estimators for the parameters.
For OLS regression, the property of being BLUE is related to the estimators of the regression
parameters (e.g. slope parameters/coefficient), rather than to the residuals/errors and the hypothesis
testing (by e.g. t-tests, p-values, S.E.) of these parameters. Hypothesis testing is not related to BLUE.

TO CONCLUDE: BLUE  OLS parameters get a minimum variance (among linear unbiased parameters)!
However, we simplify, to be correct note that there is a distinction: An estimator is a rule or a formula that tells you how to calculate the estimate of a parameter from your data. For example, the sample mean (X̄) is an estimator of the population mean (μ), and
the sample variance (S²) is an estimator of the population variance (σ²). It is actually this estimator we refer to, not the parameter.

• Note: The assumption of error normality is not a requirement for an estimator to be BLUE. However,
a requirement for reliable hypothesis testing (reject/not reject H0) and interval estimation
(confidence intervals)…
…thus error nomality is a separate concern from the estimator's properties as defined by the Gauss-
Markov theorem).
TO CONCLUDE: Error Normality  Hypothesis testing works (e.g. “rejecting/not rejecting” H0)!
BUT
CLT and OLS Assumptions / BLUE MLE = Maximum Likelihood Estimation

Without Error Normality - non-large samples:


• OLS Coefficients: Remain BLUE, yet SEs may lack accuracy.
• MLE: Consistent, but inefficient (or risk of inefficiency since the SEs may be inaccurate).
With Error Normality - non-large samples:
• OLS: Retain BLUE, with sound SEs even in small samples.
• MLE: Remains efficient, with reliable SEs in small samples.
Large Samples – both OLS and MLE work well (or are at least more likely to work well!
Central Limit Theorem supports accuracy in hypothesis testing and confidence intervals, solving
concerns about error term distribution. Asymptotic normality of error terms in large samples.
The details… (this new additional information is something that you do not need to memorize)
OLS:
• BLUE: Under the Gauss-Markov assumptions (linearity, independence, homoscedasticity, no autocorrelation, no multicollinearity).
• Consistent: As sample size increases, estimators converge in probability to the true parameter values.
• Asymptotically Normal: The distribution of the estimator approaches a normal distribution as sample size increases.
• Note: "Asymptotically Efficient" within the class of linear unbiased estimators.
MLE:
• Consistent: As sample size increases, estimators converge in probability to the true parameter values.
• Asymptotically Normal: The distribution of the estimator approaches a normal distribution as sample size increases.
• Asymptotically Efficient: In large samples, MLE achieves the lowest possible variance among all consistent estimators.
• Note: Not BLUE, as BLUE is specific to OLS estimators under the Gauss-Markov theorem.
In summary:
• BLUE: Related to the properties of the estimator (of regression coefficients), not the residuals.
• Accurate SEs: Are not related to BLUE but are important for valid hypothesis testing (e.g. t-tests) and
interval estimation (conf. intervals).
Coefficient of Determination 2

𝑅𝑅 2R2 vs.
- repetition
• The coefficient of determination 𝑅𝑅 for a multiple linear regression means the proportion of the
variation in the dependent variable, due to the estimated linear model based on the p independent
variables.
• The coefficient of determination 𝑅𝑅 2 never decreases by adding irrelevant independent variables to
the model. Therefore, it cannot be reliable as a goodness of fit measurement when comparing
models, with different numbers of independent variables (with the same 𝑌𝑌 values).
• To correct the increase in 𝑅𝑅2 due to extra variables in the model an adjusted 𝑅𝑅2 , denoted as 𝑅𝑅� 2 , is
used. A model with lower 𝑅𝑅� 2 is preferred (If 𝑌𝑌 values are the same).
2 =1−
𝑅𝑅𝑅𝑅𝑅𝑅
𝑅𝑅
𝑇𝑇𝑇𝑇𝑇𝑇
𝑅𝑅𝑆𝑆𝑆𝑆/𝑑𝑑𝑑𝑑𝑅𝑅𝑅𝑅𝑅𝑅 𝑅𝑅𝑆𝑆𝑆𝑆/(𝑛𝑛 − 𝑝𝑝 − 1)
𝑅𝑅� 2 = 1 − =1−
𝑇𝑇𝑆𝑆𝑆𝑆/𝑑𝑑𝑑𝑑𝑇𝑇𝑆𝑆𝑆𝑆 𝑇𝑇𝑆𝑆𝑆𝑆/(𝑛𝑛 − 1)
• However, note that Adjusted R2 do not have an interpretation like R2. So Adjusted R2 is used to select between different models
(to determine which one has the best fit and should be chosen).
• When the Adjusted R2 have selected the best model (in comparison to other models with different number of independent
variables)… then use R2 to obtain how much of the total variation that is explained by the independent variables.
• If in 2 models with the same number of independent variables  R2 works fine both for model comparison and interpretation.
RSS = Residual SS
Statistical Inference in Multiple Linear Regression
• Under error normality assumption, similar to the simple linear regression, we
can make statistical inference for the OLS estimators of the multiple linear
regression, as explained below.
1) Estimators of the multiple linear regression coefficients, 𝛽𝛽̂𝑘𝑘 ’s, follow
normal distributions.
2) The error variance estimator 𝜎𝜎� 2 is related to a Chi-square distribution
with 𝑛𝑛 − 𝑝𝑝 − 1 degrees of freedom.
3) If error variance 𝜎𝜎 2 is replaced by its estimator 𝜎𝜎� 2 , standardized
parameter estimators follow t distribution with 𝑑𝑑𝑑𝑑 = 𝑛𝑛 − 𝑝𝑝 − 1.
• The above distributions can be used in hypothesis testing and estimating
confidence intervals, the same way as in the simple linear regression.
• ANOVA (F test) is used to test the overall significance of the estimated model.
OLS Significance Tests
Under normality assumption of iid (identically and independently distributed) regression
error terms, hypotheses about regression parameters are tested using t distribution, as
shown below. In the context of the hypotheses, β represents the true population
parameter value that you're testing against (e.g. 0 or 1), and βi is the estimated
coefficient from your OLS regression model. For example, so given 𝛽𝛽𝑖𝑖 = 𝛽𝛽 and (i) if we
want to test if the intercept is sign. different from 0 we write (due to 𝛽𝛽0 = 𝛽𝛽 with beta=0)
 𝛽𝛽0 = 0, (ii) we want to test if a slope is sign. different from 0 we write (due to 𝛽𝛽3 = 𝛽𝛽
with beta=0.75)  𝛽𝛽3 = 0.75.
Null Hypothesis Alternative Hypothesis Reject Null Hypothesis if:

𝐻𝐻0 : 𝛽𝛽𝑖𝑖 = 𝛽𝛽 𝐻𝐻1 : 𝛽𝛽𝑖𝑖 ≠ 𝛽𝛽 𝑇𝑇 > 𝑡𝑡𝛼𝛼,𝑑𝑑𝑑𝑑


2

𝐻𝐻0 : 𝛽𝛽𝑖𝑖 ≤ 𝛽𝛽 𝐻𝐻1 : 𝛽𝛽𝑖𝑖 > 𝛽𝛽 𝑇𝑇 > 𝑡𝑡𝛼𝛼,𝑑𝑑𝑑𝑑


𝐻𝐻0 : 𝛽𝛽𝑖𝑖 ≥ 𝛽𝛽 𝐻𝐻1 : 𝛽𝛽𝑖𝑖 < 𝛽𝛽 𝑇𝑇 < 𝑡𝑡𝛼𝛼,𝑑𝑑𝑑𝑑
�𝑖𝑖 −𝛽𝛽
𝛽𝛽
where 𝑇𝑇 = �𝑖𝑖 )
𝑠𝑠.𝑒𝑒(𝛽𝛽
Assume 5% sign level and n=1000

Two-tailed test:
𝐻𝐻0 : 𝛽𝛽1 = 𝛽𝛽
𝐻𝐻1 : 𝛽𝛽1 ≠ 𝛽𝛽
Left-tailed test:
𝐻𝐻0 : 𝛽𝛽1 = 𝛽𝛽
𝐻𝐻1 : 𝛽𝛽1 < 𝛽𝛽
Right-tailed test:
𝐻𝐻0 : 𝛽𝛽1 = 𝛽𝛽
𝐻𝐻1 : 𝛽𝛽1 > 𝛽𝛽

If Abs(𝑡𝑡𝑜𝑜𝑜𝑜𝑜𝑜 ) > 𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 → then we reject H0

tcrit = -1.96 tcrit = 1.96

Reject H0 if the test statistic tobs falls in these red regions

tcrit = -1.645 tcrit = 1.645


Reject H0 if the test statistic tobs Reject H0 if the test statistic tobs
falls in this red region falls in this red region
F-Test in OLS Multiple Linear Regression
• Recall the total sum of squares partitioning into its components; explained sum of
squares and residual sum of squares, as shown below.
𝑇𝑇𝑇𝑇𝑇𝑇 = 𝐸𝐸𝐸𝐸𝐸𝐸 + 𝑅𝑅𝑅𝑅𝑅𝑅
• Under iid error normality assumption, we have
𝐸𝐸𝑆𝑆𝑆𝑆/𝑝𝑝
𝐹𝐹 = ~𝐹𝐹(𝑝𝑝, 𝑛𝑛 − 𝑝𝑝 − 1)
𝑅𝑅𝑆𝑆𝑆𝑆/(𝑛𝑛 − 𝑝𝑝 − 1)
with 𝑛𝑛 the sample size and 𝑝𝑝 the number of explanatory variables (excluding the
intercept).
• Using F-test, the hypotheses are stated as follows.
𝐻𝐻0 : 𝛽𝛽1 = 𝛽𝛽2 = ⋯ = 𝛽𝛽𝑝𝑝 = 0

𝐻𝐻1 : 𝛽𝛽1 ≠ 0 or 𝛽𝛽2 ≠ 0 or … or 𝛽𝛽𝑝𝑝 ≠ 0
• Under the null hypothesis, a null model (with only intercept) is equally good as when
including the explanatory variables.

ESS = Explained SS, RSS = Residual SS


ANOVA Table in OLS Multiple Linear Regression
• ANOVA test for the OLS estimation method of multiple linear regression is arranged in
the ANOVA table, like this:
S.O.V df SS MS F
(Source of Variation) (Degrees of Freedom) (Sum of Squares) (Mean Squared) (F Test Statistic)
Due to Regression (ESS) Number of slopes (𝑝𝑝) ESS MSR=ESS/df F=MSR/MSE
Sample size minus
number of regression
Due to Error (RSS) RSS MSE=RSS/df -
parameters
𝑛𝑛 − 𝑝𝑝 − 1
Sample size minus one
Total Sum (TSS) TSS - -
𝑛𝑛 − 1

• F test is a test of overall effect of the explanatory variables on the dependent variable,
i.e., it is a test for model significance. F test statistics follows 𝐹𝐹~𝐹𝐹(𝑝𝑝, 𝑛𝑛 − 𝑝𝑝 − 1).
• The MSE from the table is the same as the regression error variance estimator 𝜎𝜎� 2 .
S.O.V (Source of Variation), df (Degrees of Freedom). SS (Sum of Squares), MS (Mean Squared), F (F Test Statistic), ESS (Explained
Sum of Squares, also known as SSR Sum of Squares (due to) Regression), RSS (Residual Sum of Squares) a.k.a. SSE (Sum of Squares
Error), TSS (Total Sum of Squares) a.k.a. Sum of Square Total, SST, MSR Mean Square (due to) Regression), MSE (Mean Square Error)
Example 1: Cobb-Douglas.wf1/dta
Using the data in the table, Cobb-Douglas production function is estimated.
Year GDP Employment Fixed Capital
ln 𝐺𝐺𝐺𝐺𝐺𝐺𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1 ln 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑡𝑡 + 𝛽𝛽2 ln 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹_𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑡𝑡 + 𝑢𝑢𝑡𝑡 (T) (Y) (L) (K)
* In Stata (must log first): 1955 114043 8310 182113
1956 120410 8529 193749
gen log_y = log(y)
1957 129187 8738 205192
gen log_l = log(l)
1958 134705 8952 215130
' In EViews: gen log_k = log(k) • The coefficient of employment is ≈0.34, 1959 139960 9171 225021
ls log(y) c log(l) log(k) reg log_y log_l log_k
which represents GDP elasticity for 1960 150511 9569 237026

employment, meaning that a 1% change in 1961 157897 9527 248897


1962 165286 9662 260661
employment increases the GDP by 0.34%. 1963 178491 10334 275466
1964 199457 10981 295378
1965 212323 11746 315715
• GDP elasticity for capital is ≈0.85, meaning 1966 226977 11521 337642
that a 1% increase in capital increases the 1967 241194 11540 363599
GDP by 0.85%. 1968 260881 12066 391847
1969 277498 12297 422382
1970 296530 12955 455049
• The t statistics are coefficient estimates 1971 306712 13338 484677
divided by their standard errors (e.g. 1972 329030 13738 520553
1973 354057 15924 561531
1.829548≈0.339732/0.185692). 1974 374977 14154 609825

• The estimate for employment is not significant with α=0.05, since its p-value (0.0849) is larger than 0.05.
• The F test statistic is 1719.231, with p-value (0.000), meaning that the overall estimated model is significant.
• The coefficient of determination R2=0.995. It means 99.5% of the variation in ln(GDP) (not GDP itself) is explained by
the estimated model ln 𝐺𝐺𝐺𝐺𝐺𝐺𝑡𝑡 = −1.65242 + 0.33973 × ln 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑡𝑡 + 0.845997 × ln 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹_𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑡𝑡 .
Example 6: Regression hypothesis testing and prediction
Example 5: (F Test & ANOVA Table)
Example 5: (F Test & ANOVA Table)
Example 5: (F Test & ANOVA Table)
Example 5: (F Test & ANOVA Table)
Example 5: (F Test & ANOVA Table)

Two-tailed
test since we
test H0: bi=0
and H1: bi≠0
Example 5: (F Test & ANOVA Table)
Example 5: (F Test & ANOVA Table)

We measure the variable Age in years and


not months. Therefore, since 96 months is
96/12=8 years, we substitute in 8.
Example 5: (F Test & ANOVA Table)

(Since Y is measured in thousands of


dollars, see the definition above)
Example 5: (F Test & ANOVA Table)
Example 5: (F Test & ANOVA Table)
Example 5: (F Test & ANOVA Table)
Example 5: (F Test & ANOVA Table)
Example 5: (F Test & ANOVA Table)
Example 5: (F Test & ANOVA Table)
Example 5: (F Test & ANOVA Table)
Example 5: (F Test & ANOVA Table)
Example 5: (F Test & ANOVA Table)

Note that the automatic p-value from the


software is 2-tailed. However, our
alternative hypothesis is here 1-tailed since
we have a 1-tailed β4<0 (and not a 2-tailed
β4≠0). This p-value will not be useful for us.
We need to split the p-
value in two. See the next
slide 
Example 5: (F Test & ANOVA Table)

…and this p-value is also almost 0  Highly significantly


Example 5: (F Test & ANOVA Table)
Relation Between F and R2
• There is a relation between F test statistic and the coefficient of determination 𝑅𝑅2 , as
shown below.

𝐸𝐸𝑆𝑆𝑆𝑆/𝑝𝑝 𝑅𝑅2 /𝑝𝑝


𝐹𝐹 = =
𝑅𝑅𝑆𝑆𝑆𝑆/(𝑛𝑛 − 𝑝𝑝 − 1) (1 − 𝑅𝑅2 )/(𝑛𝑛 − 𝑝𝑝 − 1)

where, 𝑝𝑝 is the number of independent variables (excluded intercept).


• If the F-test statistic is not reported, a reported 𝑅𝑅2 can be used to find the F test
statistic for a formal test of overall model significance.
• The F-test cannot be seen as a significance test of R2, but The F-test* assesses the
significance of the overall regression model, which, if significant, also validates the
explanatory power indicated by the R2 value.
* A significant F-test indicates that your model as a whole has explanatory power
beyond the mean of the dependent variable, while a higher R2 value indicates a better
ESS=Explained SS, RSS=Residual SS fit, the F-test evaluates whether the improvement in fit is statistically significant.
Hypothesis testing for comparing nested models – terms
• If imposing zero equality restriction on some of its coefficients in a model.
Then, the restricted model is called a nested model inside the model without
restrictions (relatively the full model).
• For example Model 2 is nested in Model 1, by zero equality restriction on the
coefficients of 𝑋𝑋3 = 0 and 𝑋𝑋5 = 0, as shown below:
Model 1: 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋1 + 𝛽𝛽2 𝑋𝑋2 + 𝛽𝛽3 𝑋𝑋3 + 𝛽𝛽4 𝑋𝑋4 + 𝛽𝛽5 𝑋𝑋5 + 𝑢𝑢
Model 2: 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋1 + 𝛽𝛽2 𝑋𝑋2 + 𝛽𝛽4 𝑋𝑋4 + 𝑢𝑢
• The full model (without restrictions) fits the data better, since it includes more
independent variables, and its residual sum of squares is smaller, whereas its R2
is larger.
• The total sum of squares of the two models are equal, as long as both have the
same set of observations on the dependent variable 𝑌𝑌.
• Easiest to call it the restricted and the unrestricted model (the full model).
F Test & Nested Models
• The residual sum of squares 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 from the nested model is larger than the residual
sum of squares 𝑅𝑅𝑅𝑅𝑅𝑅𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈 from the full model.
• We use F test to check any significant difference between 𝑅𝑅𝑅𝑅𝑅𝑅’s,
(𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 −𝑅𝑅𝑆𝑆𝑆𝑆𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈 )/(𝑑𝑑𝑑𝑑𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 −𝑑𝑑𝑑𝑑𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈 )
𝐹𝐹 =
𝑅𝑅𝑆𝑆𝑆𝑆𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈 /𝑑𝑑𝑑𝑑𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈

where ~𝐹𝐹 𝑑𝑑𝑑𝑑𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 − 𝑑𝑑𝑑𝑑𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈 , 𝑑𝑑𝑑𝑑𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈 ,


and the null and alternative hypotheses are
𝐻𝐻 : 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝑎𝑎𝑎𝑎𝑎𝑎 𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑎𝑎𝑎𝑎𝑎𝑎 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒.
� 0
𝐻𝐻1 : 𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏.
• With large enough ∆𝑅𝑅𝑆𝑆𝑆𝑆 the F test is significant. If the F test is significant, the full model fits
significantly better than the restricted (nested=restricted) model.
• This is a useful technique to test whether some variables lead to some “value added” to
the model or not. Restrictions can be any linear transformations of parameters.
RSS = Residual SS
Relation between F Test & R2 in Nested Models
• Since both the restricted (nested) model and unrestricted (full) model have the
same total sum of squares, dividing both numerator and denominator of the F
test statistic by the total sum of squares SST, we can write,
𝑅𝑅𝑆𝑆𝑆𝑆𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝑅𝑅𝑆𝑆𝑆𝑆𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈
( 𝑆𝑆𝑆𝑆𝑆𝑆 − )/(𝑑𝑑𝑑𝑑𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 − 𝑑𝑑𝑑𝑑𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈 )
𝐹𝐹 = 𝑆𝑆𝑆𝑆𝑆𝑆
𝑅𝑅𝑆𝑆𝑆𝑆𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈
( )/𝑑𝑑𝑑𝑑𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈
𝑆𝑆𝑆𝑆𝑆𝑆
2 2
(𝑅𝑅𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈 − 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 )/(𝑑𝑑𝑑𝑑𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 − 𝑑𝑑𝑑𝑑𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈 )
𝐹𝐹 = 2
(1 − 𝑅𝑅𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈 )/𝑑𝑑𝑑𝑑𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈

RSS = Residual SS
In Eviews: File/New/Workfile. Dated/Annual. Start data: 1955 End date: 1974. Quick/Empty Group (Edit Series). On the first row write at Column1: T, Col2: Y, Col3: L, Col4: K. Copy the data and paste into the cells.
Then in the command window: ls log(y) c log(l) log(k) AND THEN ls log(y) c log(l) log(k) l k For this last regression Choose: View/Coefficient Diagnostics/Wald Test Coefficient Restrictions. Type in c(4)=c(5)=0
Year GDP Employment Fixed Capital
(since these are the two last estimated coefficinets when log(y) c log(l) log(k) l k is estimated. Thus, we test the coefficients of L and K, that is this test hypothesis 𝐻𝐻0:𝛽𝛽3=𝛽𝛽4=0
(T) (Y) (L) (K)
Using the data in the following table, Cobb-Douglas production function is estimated. 1955 114043 8310 182113
1956 120410 8529 193749
ln 𝑌𝑌𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1 ln 𝐿𝐿𝑡𝑡 + 𝛽𝛽2 ln 𝐾𝐾𝑡𝑡 + 𝑢𝑢𝑡𝑡 Is there any value added of these 2 extra variables in this unrestricted model? 1957 129187 8738 205192
The transcendental production function (a generalization of Cobb-Douglas function) is 1958 134705 8952 215130

as follows. ln 𝑌𝑌𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1 ln 𝐿𝐿𝑡𝑡 + 𝛽𝛽2 ln 𝐾𝐾𝑡𝑡 + 𝛽𝛽3 𝐿𝐿𝑡𝑡 + 𝛽𝛽4 𝐾𝐾𝑡𝑡 + 𝑢𝑢𝑡𝑡 1959 139960 9171 225021
1960 150511 9569 237026
Restricted model Unrestricted model (with all variables=no restrictions)
1961 157897 9527 248897
1962 165286 9662 260661
1963 178491 10334 275466
1964 199457 10981 295378
1965 212323 11746 315715
1966 226977 11521 337642
1967 241194 11540 363599
1968 260881 12066 391847
1969 277498 12297 422382
1970 296530 12955 455049
1971 306712 13338 484677
1972 329030 13738 520553
1973 354057 15924 561531
1974 374977 14154 609825
To check if the Cobb-Douglas The critical value is (Fobs=31.01923)>(3.68=Fcrit,2,15.5%).
production function is the Therefore, the null hypothesis is rejected. At 5%
correct model, we can test the significance level. The second unrestricted model
hypothesis significantly fits better. OR
𝐻𝐻0 : 𝛽𝛽3 = 𝛽𝛽4 = 0. The p-value for the F-statistic in the Wald test is:
(which are the 4th c(4) and the (p-value=0.0000)<(0.05=sign.level). Therefore, the null
5th c(5) coefficients since we hypothesis is rejected. At 5% significance level. The second
have an intercept as c(1)) unrestricted model significantly fits better.
* In Stata (must log first): Cobb-Douglas.wf1/dta
reg log_y log_l log_k
reg log_y log_l log_k l k Year GDP Employment Fixed Capital
testparm l k (T) (Y) (L) (K)
Using the data in the following table, Cobb-Douglas production function is estimated. 1955 114043 8310 182113
1956 120410 8529 193749
ln 𝑌𝑌𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1 ln 𝐿𝐿𝑡𝑡 + 𝛽𝛽2 ln 𝐾𝐾𝑡𝑡 + 𝑢𝑢𝑡𝑡 Is there any value added of these 2 extra variables in this unrestricted model? 1957 129187 8738 205192
The transcendental production function (a generalization of Cobb-Douglas function) is 1958 134705 8952 215130

as follows. ln 𝑌𝑌𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1 ln 𝐿𝐿𝑡𝑡 + 𝛽𝛽2 ln 𝐾𝐾𝑡𝑡 + 𝛽𝛽3 𝐿𝐿𝑡𝑡 + 𝛽𝛽4 𝐾𝐾𝑡𝑡 + 𝑢𝑢𝑡𝑡 1959 139960 9171 225021
1960 150511 9569 237026
Restricted model Unrestricted model (with all variables=no restrictions)
1961 157897 9527 248897
1962 165286 9662 260661
1963 178491 10334 275466
1964 199457 10981 295378
1965 212323 11746 315715
1966 226977 11521 337642
1967 241194 11540 363599
1968 260881 12066 391847
1969 277498 12297 422382
1970 296530 12955 455049
1971 306712 13338 484677
1972 329030 13738 520553
1973 354057 15924 561531
1974 374977 14154 609825
To check if the Cobb-Douglas The critical value is (Fobs=31.02)>(3.68=Fcrit,2,15.5%).
production function is the Therefore, the null hypothesis is rejected. At 5%
correct model, we can test the significance level. The second unrestricted model
hypothesis significantly fits better. OR
𝐻𝐻0 : 𝛽𝛽3 = 𝛽𝛽4 = 0 The p-value for the F-statistic in the Wald test is:
(p-value=0.0000)<(0.05=sign.level). Therefore, the null
hypothesis is rejected. At 5% significance level. The second
unrestricted model significantly fits better.
Or equivalently using one of these formulas:

Restricted model Unrestricted model (with all variables=no restrictions)

Numerator d.f. = number of restrictions. Beta3


and Beta4 are two restrictions = 2
Denominator d.f. for unrestricted = n-k-1 =20-4-
1=15 since we have 4 slope coefficients
Or, for F(d.f. diff. in restricted vs. unrestricted,
d.f. unrestricted)=F(17-15, 15)=F(2,15)

USING R2: To check if the Cobb-Douglas production function is the correct model, we can test the hypothesis 𝐻𝐻0 : 𝛽𝛽3 = 𝛽𝛽4 = 0.
(0.999042−0.995080)/((20−2−1)−(20−4−1))
𝐹𝐹 = = 31.0177453~𝐹𝐹 2,15
(1−0.999042)/(20−4−1)
where ~𝐹𝐹 𝑑𝑑𝑑𝑑𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 − 𝑑𝑑𝑑𝑑𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈 , 𝑑𝑑𝑑𝑑𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈 or ~𝐹𝐹 (20 − 2 − 1) − (20 − 4 − 1), (20 − 4 − 1)
The critical value is 3.68. (Fobs=31.0177453)>(3.68=Fcrit). Therefore, the null hypothesis is rejected. The second unrestricted
model significantly fits better.
USING RSS (Residual SS): To check if the Cobb-Douglas production function is the correct model, we can test the hypothesis
𝐻𝐻0 : 𝛽𝛽3 = 𝛽𝛽4 = 0.
(0.013604 − 0.002649)/(20 − 2 − 1 − (20 − 4 − 1))
𝐹𝐹 = = 31.0177453~𝐹𝐹 2,15
(0.002649)/(20 − 4 − 1)
The critical value is 3.68. (Fobs=31.01642129)>(3.68=Fcrit). Therefore, the null hypothesis is rejected. The second unrestricted
model significantly fits better.
F Test & Nested Models
Thus, easiest to use these formulas
(or most convenient to use a software and give the command for a Wald test)

• Using R2:

• Using RSS (Residual Sums of Square):


si = Education attainment refers to the highest
level of education that an individual has
completed.
The F-test of the overall regression is significant at 5%. However, note the
low R2=0.007. Even if the F test is significant saying that at least one of the
explanatory variables are different from zero some other tests are
indicating that this is not an optimal model. However, only regne (region
north east) and regw (region west) are significant for the t-tests, while the
t-test for regnc (north central) is highly insignificant.
Maybe Educational attainment is more affected of whether a student come from an urban – or rural area
– than if the person comes from regne, regnc, or regw?

We include a dummy variable that


indicates whether the person is from an
urban area (and then this variable is
equal to 1), or from a rural area (and
then this variable is equal to 0).
We formulate an unrestricted and a restricted model –
where the restricted model sets restrictions for
β2=β3=β4=0, but not for β5 (that is the parameter for the
urban dummy variable) in the restricted model. (In the
unrestricted model all parameters are estimated).

RSS = Residual SS
The unrestricted model with all variables – we save RSS (Residual SS)

d.f. (UR) = n-k-1=2000-4-1=1195 (where k=number of slopes=4)


The restricted model with only the urban dummy variable – we save RSS (Residual SS)

d.f. (R) = n-k-1=2000-1-1=1198 (where k=number of slopes=1)


We substitute these 2 RSS (Residual SS) to the F test below

Since Fobs=1.59 < 2.61=Fcrit Numerator d.f. = number of restrictions. Β2,


β3, and β4 are three restrictions = 3
we cannot reject H0 Denominator d.f. for unrestricted = n-k-1
We can choose the restricted model where regions =2000-4-1=1195 since we have 4 slope
coefficients. Or, for F(d.f. diff. in restricted vs.
are not important. Based on this model, what unrestricted, d.f. unrestricted)=F(3, 1195)
mainly determines “education attainment” is if the
person comes from an urban or a rural location. There is
no value added from the 3 regne, regnc or regw, that is
not already explained by the urban dummy.
In EViews:
series sparents = sm + sf
ls s c sparents
Numerator d.f. = number of restrictions. β2 =
β3, that is one restriction = 1 since both betas
Since Fobs=0.082 < 3.86=Fcrit become the only one mutual β (or 1052-
1051=1). Denominator d.f. for unrestricted =
we cannot reject H0 n-k-1 =1054-2-1=1051 since we have 4 slope
coefficients. Or, for F(d.f. diff. in restricted vs.
We can choose the restricted model since it is unrestricted, d.f. unrestricted)=F(1, 1051)
likely that β2= β3. That is H0: β2 = β3 is not unlikely
(that is, it is no point in separating sm and sf.
Instead we can use sparents=sm+sf as a variable.
OLS with Standardized Variables
• Problem:
• Scale effects occur when variables have disparate units, means, and variances.
• Remedy:
• Standardize the variables to neutralize scale effects.
• Transformation:
• Variables are adjusted to have a mean of 0 and a variance of 1 for normalization.
• Interpretation:
• E.g. a 1 SD increase in the independent variable, gives β* SD changes in the dependent variable.
• Purpose:
• Allows for the comparison of the relative importance of each predictor variable within the model.
• A higher standardized beta coefficient (also known as beta weights) implies an independent variable has a
greater influence on the dependent variable.
• Thus, the magnitude of the beta weights (the absolute value of the standardized coefficient) signifies the
predictor's association strength with the dependent variable. A higher beta implies a higher relative strength.
• Unstandardized coefficient may be misleading:
• Coefficients from non-standardized variables may unintentionally capture scale variances instead of the true
effect.
• Standardization corrects this, allowing for an “apples-to-apples” comparison of the model's coefficients.
Note 1: When discussing "the impact on the dependent variable," we are referring to how the standardized beta weights measure the expected change in the outcome
variable given a one SD change in the independent variables. This effect is assessed while holding other variables in the model constant. Beta weights help us interpret the
strength and direction of the relationship between each predictor and the outcome variable in a standardized scale, making it easier to identify which predictors are most
influential. Note 2: Multicollinearity can distort the analysis – just as for unstandardized variables – and we cannot separate the explanatory effect of the independent
variables. Note 3: Correlation and R2 are the same after standardization.
OLS with Standardized Variables - example
WITHOUT STANDARDIZATION: Investment Returns = β0 + β1 × (Market Risk Premium) + β2 × (Book−to−Market Ratio) + ε
• Coefficients:
• β1 (Market Risk Premium): 1.2
• β2 (Book-to-Market Ratio): 0.03
• Interpretation:
A one-unit increase in market risk premium is associated with a 1.2 unit increase in investment returns.
An increase of one unit in book-to-market ratio is associated with a 0.03 unit increase in investment returns.
• Misleading Conclusion: It may appear that market risk premium has a 40 times greater impact on investment returns than book-
to-market ratio (1.2 vs. 0.03). (But these cannot be compared in this way without standardization).

WITH STANDARDIZATION: Investment Returns = β0 + β1∗ × (Std. Market Risk Premium) + β∗2 × (Std. Book−to−Market Ratio) + ε
• Standardized Coefficients:
• β1∗ (Standardized Market Risk Premium): 0.5
• β∗2 (Standardized Book-to-Market Ratio): 0.6
• Interpretation:
• A 1 SD increase in market risk premium is associated with a 0.5 SD increase in investment returns.
• A 1 SD increase in the book-to-market ratio is associated with a 0.6 SD increase in investment returns.
• Corrected Conclusion:
• Book-to-market ratio has a greater standardized impact on investment returns than market risk premium (0.6 vs. 0.5).

Through standardization, the coefficients are rescaled so that they can be compared directly, revealing that the book-to-market
ratio has a slightly greater impact on investment returns than market risk premium when comparing the effects in terms of
standard deviations. This contrasts with the initial, potentially misleading conclusion drawn from non-standardized coefficients.
OLS with
Savings
Standardized
Savings Credit
Variables Creditscore standardized coef.wf1/dta
'In EViews TO STANDARDIZE ALL VARIABLES: series y_star = (y - @mean(y)) / @stdev(y)
Age (SEK) (USD) Score *with mean=0, and stdev=1, using @mean(y) and @stdev(y)
AGE SAVINGS_SEK SAVINGS_USD CREDITSCORE
'Standardizing the variables - and adding the suffix "_STAR" to the new standardized variables
76.46 207202.18 20720.22 664.79 series AGE_star = (AGE - @mean(AGE)) / @stdev(AGE)
56.00 272713.68 27271.37 651.31 series SAVINGS_SEK_star = (SAVINGS_SEK - @mean(SAVINGS_SEK)) /
@stdev(SAVINGS_SEK)
64.68 238051.89 23805.19 645.25 series SAVINGS_USD_star = (SAVINGS_USD - @mean(SAVINGS_USD)) /
@stdev(SAVINGS_USD)
83.61 206083.75 20608.38 649.14 series CREDITSCORE_star = (CREDITSCORE - @mean(CREDITSCORE)) /
78.01 222193.16 22219.32 661.15 @stdev(CREDITSCORE)

35.34 216683.72 21668.37 632.00 'Running the UnStandardized OLS regressions


Ls CREDITSCORE c AGE SAVINGS_SEK
64.25 274703.95 27470.40 666.45 Ls CREDITSCORE c AGE SAVINGS_USD
47.73 189742.09 18974.21 652.10 'Running the Standardized OLS regressions
48.45 215653.39 21565.34 637.04 Ls CREDITSCORE_star c AGE_star SAVINGS_SEK_star
Ls CREDITSCORE_star c AGE_star SAVINGS_USD_star
56.16 157295.21 15729.52 643.48
Credit Score = β0 + 0.487 × Age + 0.00090058 × Savings_SEK_star
yj − Mean(y)
Zj =
St.dev.(y) Credit Score = β0 + 0.487 × Age + 0.000090058 × Savings_SEK_star
Note: After standardizing the intercept will be zero
Standardization of e.g. AGE:
Credit Score_star = 0.6512 × Age_star + 0.2795 × Savings_SEK_star
∗ AGEj − Mean (AGE)
AGE j = Credit Score_star = 0.6512 × Age_star + 0.2795 × Savings_USD_star
St.dev.(AGE)
OLS with Standardized Variables Note: Savings_SEK gives a 10 times smaller
coefficient, than Savings_USD.

𝟎𝟎. 𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎 > 𝟎𝟎. 𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎 = 𝟗𝟗. 𝟎𝟎𝟎𝟎𝟎𝟎 − 𝟎𝟎𝟎𝟎 = 𝟗𝟗. 𝟎𝟎𝟎𝟎 × 𝟏𝟏𝟏𝟏−𝟓𝟓 𝟏𝟏𝟏𝟏 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝒉𝒉𝒉𝒉𝒉𝒉𝒉𝒉𝒉𝒉𝒉𝒉 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄. 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝒇𝒇𝒇𝒇𝒇𝒇 𝑺𝑺𝑺𝑺𝑺𝑺 𝟗𝟗. 𝟎𝟎𝟎𝟎𝟎𝟎 − 𝟎𝟎𝟎𝟎 = 𝟎𝟎. 𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎 < 𝟎𝟎. 𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎 𝟏𝟏𝟏𝟏 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝒍𝒍𝒍𝒍𝒍𝒍𝒍𝒍𝒍𝒍 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄. 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝒇𝒇𝒇𝒇𝒇𝒇 𝑼𝑼𝑼𝑼𝑼𝑼

Despite different scales (currencies), standardization makes coefficients for SAVINGS_USD = SAVINGS_SEK_STAR identical and comparable with AGE_STAR

Standardization solves the problem, but note that SE, t-stat, F-stat, R2 etc. are the same in the models
OLS with
Savings
Standardized
Savings Credit
Variables Creditscore standardized coef.wf1/dta

Age (SEK) (USD) Score


AGE SAVINGS_SEK SAVINGS_USD CREDITSCORE
* In Stata
76.46 207202.18 20720.22 664.79 * Standardizing the variables - and adding the suffix "_star" to the new standardized variables
egen age_star = std(age)
56.00 272713.68 27271.37 651.31 egen savings_sek_star = std(savings_sek)
egen savings_usd_star = std(savings_usd)
64.68 238051.89 23805.19 645.25 egen creditscore_star = std(creditscore)
83.61 206083.75 20608.38 649.14
* Running the Unstandardized OLS regressions
78.01 222193.16 22219.32 661.15 reg creditscore age savings_sek
reg creditscore age savings_usd
35.34 216683.72 21668.37 632.00
64.25 274703.95 27470.40 666.45 * Running the Standardized OLS regressions
reg creditscore_star age_star savings_sek_star
47.73 189742.09 18974.21 652.10 reg creditscore_star age_star savings_usd_star

48.45 215653.39 21565.34 637.04


56.16 157295.21 15729.52 643.48
Credit Score = β0 + 0.487 × Age + 0.00090058 × Savings_SEK_star
yj − Mean(y)
Zj =
St.dev.(y) Credit Score = β0 + 0.487 × Age + 0.000090058 × Savings_SEK_star
Note: After standardizing the intercept will be zero
Standardization of e.g. AGE:
Credit Score_star = 0.6512 × Age_star + 0.2795 × Savings_SEK_star
∗ AGEj − Mean (AGE)
AGE j = Credit Score_star = 0.6512 × Age_star + 0.2795 × Savings_USD_star
St.dev.(AGE)
OLS with Standardized Variables Note: Savings_SEK gives a 10 times smaller
coefficient, than Savings_USD.

𝟎𝟎. 𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎 > 𝟎𝟎. 𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎 𝟏𝟏𝟏𝟏 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝒉𝒉𝒉𝒉𝒉𝒉𝒉𝒉𝒉𝒉𝒉𝒉 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄. 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝒇𝒇𝒇𝒇𝒇𝒇 𝑺𝑺𝑺𝑺𝑺𝑺 𝟎𝟎. 𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎 < 𝟎𝟎. 𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎 𝟏𝟏𝟏𝟏 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝒍𝒍𝒍𝒍𝒍𝒍𝒍𝒍𝒍𝒍 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄. 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝒇𝒇𝒇𝒇𝒇𝒇 𝑼𝑼𝑼𝑼𝑼𝑼

Despite different scales (currencies), standardization makes coefficients for SAVINGS_USD = SAVINGS_SEK_STAR identical and comparable with AGE_STAR

Standardization solves the problem, but note that SE, t-stat, F-stat, R2 etc. are the same in the models
OLS with Standardized Independent Variables
• Standardization of an independent variable 𝑋𝑋𝑗𝑗 (excluding intercept) is a linear
transformation, based on itself and the intercept, as shown below.
𝑋𝑋𝑗𝑗 − �𝑗𝑗
𝑋𝑋
𝑋𝑋𝑗𝑗∗ = 𝑠𝑠𝑋𝑋𝑗𝑗 is the standard deviation of the independent variable X in a dataset.
j

𝑠𝑠𝑋𝑋𝑗𝑗
• The consequences of standardizing of the independent variable 𝑿𝑿𝒋𝒋 are as follows:
 It does not affect the estimators of other slopes.
 The slope estimator of 𝑋𝑋𝑗𝑗 becomes 𝛽𝛽̂𝑗𝑗∗ = 𝑠𝑠𝑋𝑋 𝛽𝛽̂𝑗𝑗 . 𝑗𝑗

 The new intercept becomes 𝛽𝛽̂0∗ = 𝛽𝛽̂0 + 𝛽𝛽̂𝑗𝑗 𝑋𝑋�𝑗𝑗 .


 The 𝑅𝑅 2 of the model remains the same.
 All t test statistics of the slopes remain the same.
 The t test statistic for the intercept is different.
• The OLS estimator of 𝛽𝛽0 is 𝛽𝛽̂0 = 𝑌𝑌� − 𝛽𝛽̂1 𝑋𝑋�1 − 𝛽𝛽̂2 𝑋𝑋�2 − ⋯ − 𝛽𝛽̂𝑝𝑝 𝑋𝑋�𝑝𝑝 , therefore, if all
independent variables are standardized, the new intercept becomes 𝛽𝛽̂0∗ = 𝑌𝑌. �
OLS with Standardized Dependent Variables
• Standardization of the dependent variable 𝒀𝒀 is a linear transformation, based on itself and the
intercept, as shown below.
𝑌𝑌 − 𝑌𝑌�
𝑌𝑌 ∗ =
𝑠𝑠𝑌𝑌 𝑠𝑠𝑌𝑌 is the standard deviation of the dependent variable Y in a dataset.
• The consequences of standardizing the dependent variable 𝒀𝒀 are as follows.
𝛽𝛽 �
 All slope estimators are affected. They become 𝛽𝛽̂𝑘𝑘∗ = 𝑘𝑘 . (main difference vs.
𝑠𝑠𝑌𝑌
independent variables)
𝛽𝛽0 −𝑌𝑌� �
̂ ∗
 The estimator of the intercept becomes 𝛽𝛽0 = .
𝑠𝑠𝑌𝑌
 The 𝑅𝑅2 of the model remains the same.
 The t test statistics of the slopes remain the same.
 The t test statistic for the intercept is different.
• If all independent variables are also standardized, the intercept and slopes with standardized variables
𝑠𝑠
become 𝛽𝛽̂0∗ = 0 (regression through the origin) and 𝛽𝛽̂𝑘𝑘∗ = 𝑠𝑠𝑋𝑋𝑘𝑘 𝛽𝛽̂𝑘𝑘 .
𝑌𝑌
OLS with Standardized Variables (both X & Y) - Example
* In Stata:
* Generating synthetic data (both Y and X standardized. Different unstandardized beta coefs. Standardized beta coef identical despite currency)
clear * Normal distribution, Mean=100, StDev=20.
set obs 100
set seed 0
gen X_SEK = rnormal(100, 20)
gen Y = 5 + 2*X_SEK + rnormal(0, 10)

* Creating USD variable with an exchange rate of 10 SEK = 1 USD


gen X_USD = X_SEK * 0.1 Standardize both Y and X:
The easiest marginal interpretation (compared to only
* Running unstandardized regressions standardizing the X variables). if X increases by 1 standard
regress Y X_SEK deviation  then Y increases by beta standard deviations.
regress Y X_USD

* Standardizing variables
egen mean_X_SEK = mean(X_SEK)
egen sd_X_SEK = sd(X_SEK)
egen mean_Y = mean(Y)
egen sd_Y = sd(Y)

gen X_SEK_std = (X_SEK - mean_X_SEK) / sd_X_SEK


gen Y_std = (Y - mean_Y) / sd_Y

* Standardizing USD variable


egen mean_X_USD = mean(X_USD)
egen sd_X_USD = sd(X_USD)
gen X_USD_std = (X_USD - mean_X_USD) / sd_X_USD

* Running standardized regressions


regress Y_std X_SEK_std
regress Y_std X_USD_std
OLS with Standardized Independent Variables - Example
* In Stata:
* Generating synthetic data (only X standardized. Different unstandardized beta coefs. Standardized beta coef identical despite currency)
clear * Normal distribution, Mean=100, StDev=20.
set obs 100
set seed 0
gen X_SEK = rnormal(100, 20)
gen Y = 5 + 2*X_SEK + rnormal(0, 10)

* Creating USD variable with an exchange rate of 10 SEK = 1 USD


gen X_USD = X_SEK * 0.1 If we only standardize X:
If X increases by one standard deviation  then Y increases by
* Running unstandardized regressions 1 unit (can be SEK or USD for Y). Less intuitive interpretation
regress Y X_SEK (than having both variables standardized).
regress Y X_USD

* Standardizing independent variables


egen mean_X_SEK = mean(X_SEK)
egen sd_X_SEK = sd(X_SEK)
gen X_SEK_std = (X_SEK - mean_X_SEK) / sd_X_SEK

egen mean_X_USD = mean(X_USD)


egen sd_X_USD = sd(X_USD)
gen X_USD_std = (X_USD - mean_X_USD) / sd_X_USD

* Running regressions with standardized


* independent variables
regress Y X_SEK_std
regress Y X_USD_std
OLS with Standardized Dependent Variables - Example
* In Stata:
* Generating synthetic data (SKIP THIS EXAMPLE – JUST FOR COMPARISON)
clear * Normal distribution, Mean=100, StDev=20.
set obs 100
set seed 0
gen X_SEK = rnormal(100, 20)
gen Y = 5 + 2*X_SEK + rnormal(0, 10)

* Creating USD variable with an exchange rate of 10 SEK = 1 USD


gen X_USD = X_SEK * 0.1 If we only standardize Y:
If X increases by one unit (e.g. USD or SEK)  then Y increases
* Running unstandardized regressions by 1 standard deviation. Less intuitive interpretation (than
regress Y X_SEK having both variables standardized).
regress Y X_USD

* Standardizing dependent variable


egen mean_Y = mean(Y)
egen sd_Y = sd(Y)
gen Y_std = (Y - mean_Y) / sd_Y

* Running regressions with standardized


* dependent variables
regress Y_std X_SEK
regress Y_std X_USD
OLS & Functional Forms of Regression
• Linearity in parameters is an OLS assumption.
• A model that is non-linear in its parameters cannot be estimated by OLS.
• But, a non-linear model that is a so-called “intrinsically linear model” can be estimated by OLS.
• An intrinsically linear model, is a non-linear model that can be transformed into a linear model.
• For instance, 𝒚𝒚 = 𝛃𝛃𝟎𝟎 𝒙𝒙𝛃𝛃𝟏𝟏 can after a logarithmic transformation become linear:
𝐥𝐥𝐥𝐥𝐥𝐥 𝒚𝒚 = 𝐥𝐥𝐥𝐥𝐥𝐥(𝛃𝛃𝟎𝟎 ) + 𝛃𝛃𝟏𝟏 𝐥𝐥𝐥𝐥 𝒙𝒙 (that is, intrinsically linear in the parameters).
• The logarithm changes the scale of the variable but does not make this a model that is non-linear
in the parameters (only non-linear in the variables which is not necessarily a problem for OLS).
• Examples:
• (i) Intrinsically linear (OLS works): That is, for 𝑦𝑦 = β0 𝑥𝑥 β1 take the natural logarithm of both sides, we
get: ln 𝑦𝑦 = ln β0 + β1 ln 𝑥𝑥 This is a linear regression model in the logarithmic scale, and we can
estimate ln β0 𝑎𝑎𝑎𝑎𝑎𝑎 β1 using OLS.
• (ii) Not intrinsically linear (OLS does not work): CANNOT be rewritten as linear in its parameters:
Consider a model where the dependent variable y is related to the independent variable x through an
exponential and a polynomial term: 𝑦𝑦 = β0 𝑒𝑒 β1𝑥𝑥 + β2 𝑥𝑥 2 the polynomial term x2 makes it impossible to
express this equation as a linear combination of the parameters 𝛽𝛽0 , 𝛽𝛽1 𝑎𝑎𝑎𝑎𝑎𝑎 𝛽𝛽2 ​. Therefore, this model
cannot be rewritten as linear in parameters and would require a non-linear regression method for
estimation. We usually use some form of maximum likelihood estimation technique, or other non-
linear regression techniques, since OLS does not work for model that are non-linear in the parameters.
• Nonlinear regression models cannot use explicit solutions of the OLS method. Should be estimated
numerically (iteratively), using methods, like; gradient descent, Gauss-Newton and Newton-Raphson.
Common Linear Models (based on transformations)
When we cannot linearize a function by a transformation we cannot use OLS estimation (need a
non-linear estimator such as e.g. ML). However, if we linearize a non-linear relationship, but by
transforming the variables we can sometimes still use OLS. Logs are often useful for linerarization.
linearization
Model Functional Form Linearized Form Marginal Effect at 𝑋𝑋 = 𝒙𝒙𝟎𝟎 Elasticity at 𝑋𝑋 = 𝒙𝒙𝟎𝟎
𝑥𝑥0
Linear 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝜀𝜀 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝜀𝜀 𝛽𝛽1 𝛽𝛽1
𝑦𝑦0
Log-linear 𝑦𝑦0
𝑌𝑌 = 𝐴𝐴𝑋𝑋𝛽𝛽1 𝑢𝑢 ln(𝑌𝑌) = 𝛽𝛽0 + 𝛽𝛽1 ln(𝑋𝑋) + 𝜀𝜀 𝛽𝛽1 𝛽𝛽1
(log-log) 𝑥𝑥0
𝑌𝑌 = 𝑒𝑒 𝛽𝛽0 +𝛽𝛽1𝑋𝑋+𝜀𝜀
Log-lin ln(𝑌𝑌) = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝜀𝜀 𝛽𝛽1 (𝑦𝑦0 ) 𝛽𝛽1 (𝑥𝑥0 )
Or, 𝑌𝑌 = 𝐴𝐴𝐴𝐴𝛽𝛽1 𝑋𝑋 𝑢𝑢
1 1
Lin-log 𝑒𝑒 𝑌𝑌 = 𝐴𝐴𝑋𝑋𝛽𝛽1 𝑢𝑢 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 ln(𝑋𝑋) + 𝜀𝜀 𝛽𝛽1 𝛽𝛽1
𝑥𝑥0 𝑦𝑦0
1 𝑋𝑋 1 1 1
Reciprocal = 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 + 𝜀𝜀 −𝛽𝛽1 −𝛽𝛽1
𝑌𝑌 𝛽𝛽0 𝑋𝑋 + 𝛽𝛽1 + 𝑢𝑢 𝑋𝑋 𝑥𝑥02 𝑥𝑥0 𝑦𝑦0
1 𝑋𝑋 1 𝑦𝑦0 1
Log Reciprocal = 𝑒𝑒 𝛽𝛽0 𝑋𝑋+𝛽𝛽1+𝑢𝑢 ln(𝑌𝑌) = 𝛽𝛽0 + 𝛽𝛽1 + 𝜀𝜀 −𝛽𝛽1 −𝛽𝛽1
𝑌𝑌 𝑋𝑋 𝑥𝑥02 𝑥𝑥0
𝑒𝑒 𝛽𝛽0 +𝛽𝛽1𝑋𝑋+𝜀𝜀 𝑌𝑌
Logit 𝑌𝑌 = ln( ) = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝜀𝜀 𝛽𝛽1 𝑦𝑦0 (1 − 𝑦𝑦0 ) 𝛽𝛽1 𝑥𝑥0 (1 − 𝑦𝑦0 )
1 + 𝑒𝑒𝛽𝛽0 +𝛽𝛽1𝑋𝑋+𝜀𝜀 1 − 𝑌𝑌
Note: In a linear model, the marginal effect is constant, and the elasticity varies depending on the values of X and Y.
In contrast, in a log-linear (log-log) model, the elasticity is constant, and the marginal effect varies with Y. In a lin-log model, the marginal effect changes with X, & the elasticity varies with Y
Slope Interpretation
Some commonly used linear models are shown in the following table. The slope
of X is interpreted differently, as shown in the table.

Model Functional Form Linearized Form Slope (𝛽𝛽1 ) Interpretation

1-unit increase in 𝑋𝑋 results in 𝛽𝛽1 units change on


Linear 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝜀𝜀 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝜀𝜀
average of 𝑌𝑌.
Log-linear 1% increase in 𝑋𝑋 results in 𝛽𝛽1 % change in 𝑌𝑌
𝑌𝑌 = 𝐴𝐴𝑋𝑋𝛽𝛽1 𝑢𝑢 ln(𝑌𝑌) = 𝛽𝛽0 + 𝛽𝛽1 ln(𝑋𝑋) + 𝜀𝜀
(log-log) (roughly, on average).
𝑌𝑌 = 𝑒𝑒 𝛽𝛽0 +𝛽𝛽1𝑋𝑋+𝜀𝜀 1-unit increase in 𝑋𝑋 results in 𝑒𝑒 𝛽𝛽1 − 1 × 100%
Log-lin ln(𝑌𝑌) = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋 + 𝜀𝜀
Or, 𝑌𝑌 = 𝐴𝐴𝐴𝐴𝛽𝛽1 𝑋𝑋 𝑢𝑢 change in 𝑌𝑌 (roughly, on average).
1% increase in 𝑋𝑋 results in 𝛽𝛽1 ln(1.01) ≈ 0.01𝛽𝛽1
Lin-log 𝑒𝑒 𝑌𝑌 = 𝐴𝐴𝑋𝑋𝛽𝛽1 𝑢𝑢 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 ln(𝑋𝑋) + 𝜀𝜀
units change in average of 𝑌𝑌.

See many practical examples in the lab notes – both manual calculations and computer demonstrations

Note in the previous slide: The reciprocal model captures a non-linear, sharply diminishing absolute impact of X on Y as X grows,
while the lin-log model maintains a constant impact on Y for each percentage increase in X, leading to diminishing absolute changes in Y if Y grows with X

You might also like