Lesson 4 Linear Assumptions
Lesson 4 Linear Assumptions
Models (4433LGLM6Y)
Vahe Avagyan
Biometris, Wageningen University and Research
Overview
• SLID data
• wages: Composite hourly wage rate ($/hour).
• age: in years.
• sex: dummy variable, (1=male or 0=female).
• education: Completed years of education.
Model assumptions
• For a linear model to be a good model, there are four conditions that need to be fulfilled.
• Linearity: The relationship between the variables can be described by a linear equation (also
called additivity)
• Equal variance: The residuals have equal variance (also called homoskedasticity)
• A quantile comparison plot can give us a sense of which observations depart from normality.
• QQ-plot of residuals
• Plot studentized residual 𝐸𝑖∗ versus normal or 𝑡𝑛−𝑘−2 distribution.
• The difference between two is important for small samples.
• In larger samples, internally studentized residuals or raw residuals will give same impression.
• QQ-plot effective in displaying tail behavior.
• Error variance:
𝑉 𝜖 = 𝑉 𝑌 𝑥1 , … , 𝑥𝑘 ) = 𝜎𝜖2
• Note: LS estimator 𝐛 remains unbiased and consistent even with nonconstant variance.
• Its efficiency is impaired (we can do better) and usual formulas for standard errors are inaccurate.
• Harm produced by heteroscedasticity is relatively mild. Worry if the largest variance is 4 times the
smallest variance (i.e., sd of the errors varies by more than a factor 2).
Graphical check of constant variance
1. 𝐸𝑖 versus 𝐲ො𝑖
4. 𝐸𝑖′ versus ℎ𝑖
Nonlinearity
• 𝐸(𝜖) = 0 implies that regression surface accurately reflects the dependency 𝐸(𝑌|𝑋𝑖 ).
𝑗
𝐸𝑖 = 𝐸𝑖 + 𝐵𝑗 𝑋𝑖𝑗 :
• SLID regression: the solid lines show the lowess smooths, the broken lines are least-squares fits.
Overview
• Log transformation 𝑌 → ln 𝑌
𝐻0 ∶ 𝜆 = 𝜆0 .
• The aim of the Box-Cox transformations is to ensure the usual assumptions for Linear Model hold.
(𝜆)
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖1 + ⋯ + 𝛽𝑘 𝑋𝑖𝑘 + 𝜖𝑖 ,
where
𝑌𝑖𝜆 − 1
=൞ 𝜆 , for 𝜆 ≠ 0
(𝜆)
𝑌𝑖
ln 𝑌𝑖 , for 𝜆 = 0
• SLID regression:
CI0.95 (𝜆)
Overview
• Generalizing the 𝐗𝜷 part of the model by adding polynomial terms (e.g., one-predictor case):
𝑦 = 𝛽0 + 𝛽1 𝑋 + ⋯ + 𝛽𝑑 𝑋 𝑑 + 𝜖
• Selection of 𝑑.
1. Keep adding terms until the added term is not statistically significant.
2. Start with a large 𝑑 , eliminate not significant terms starting with the highest order term.
• Principle of marginality: do not remove lower order terms from model, even if they are not statistically
significant.
Polynomial regression
Orthogonal polynomials
• When a term is removed from or added to the model, coefficients change and model needs to be refitted.
• A spline is a piecewise polynomial with a certain level of smoothness. Spline fixes the disadvantages of
Polynomial regression by combining it with Segmented regression (see more on practical session).
• Define cubic B-spline basis 𝑆(𝑋) (defined over [𝑎; 𝑏]) using knots at 𝑡1 , … , 𝑡𝑘
• Partition 𝑎 = 𝑡0 < 𝑡1 < · · · < 𝑡𝑘 = 𝑏 , function 𝑆(X) is cubic on each subinterval [𝑡𝑖 , 𝑡𝑖+1 ], i.e.,
𝑆𝑖 𝑋 = 𝑎0,𝑖 + 𝑎1,𝑖 𝑋 + 𝑎2,𝑖 𝑋 2 + 𝑎3,𝑖 𝑋 3