0% found this document useful (0 votes)

48 views

Lesson 4 Linear Assumptions

This document discusses checking assumptions in linear models. It covers checking for normality, constant variance, and nonlinearity through graphical methods like QQ plots and residual plots. It also discusses transforming both the response and predictor variables using approaches like Box-Cox transformations and polynomials to better meet linear model assumptions. Polynomials can be replaced with orthogonal polynomials or regression splines to avoid issues like collinearity. Splines use B-spline basis functions defined over intervals separated by knots to allow for flexible nonlinear relationships.

Uploaded by

maartenwilders

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views

Lesson 4 Linear Assumptions

Uploaded by

maartenwilders

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Linear and Generalized Linear

Models (4433LGLM6Y)

Checking assumptions in Linear Model

Meeting 7

Vahe Avagyan
Biometris, Wageningen University and Research
Overview

• Checking assumptions in Linear Model (Fox, 12.1 -12.2)

• Transformations (Fox, 12.3 -12.4)
• Polynomials and splines (Faraway text, 8.2-8.4)
Overview

• Checking assumptions in Linear Model

• Transformations
• Polynomials and splines
Example: Survey of Labour and Income Dynamics

• SLID data
• wages: Composite hourly wage rate ($/hour).
• age: in years.
• sex: dummy variable, (1=male or 0=female).
• education: Completed years of education.
Model assumptions

• For a linear model to be a good model, there are four conditions that need to be fulfilled.

• Independence: The residuals are independent of each other.

• Linearity: The relationship between the variables can be described by a linear equation (also
called additivity)

• Equal variance: The residuals have equal variance (also called homoskedasticity)

• Normality: The distribution of the residuals is normal

Graphical check of normality

• A quantile comparison plot can give us a sense of which observations depart from normality.

• QQ-plot of residuals
• Plot studentized residual 𝐸𝑖∗ versus normal or 𝑡𝑛−𝑘−2 distribution.
• The difference between two is important for small samples.
• In larger samples, internally studentized residuals or raw residuals will give same impression.
• QQ-plot effective in displaying tail behavior.

• Histogram or smoothed histogram.

• The skew may help to chose a transformation.
Graphical check of normality: Examples: SLID regression

• (a) QQ-plot and (b) smoothed histogram

of studentized residuals 𝐸𝑖∗ .

• First row, not transformed.

• Second row, after the log-transformation.

Nonconstant Error Variance

• Error variance:
𝑉 𝜖 = 𝑉 𝑌 𝑥1 , … , 𝑥𝑘 ) = 𝜎𝜖2

Heteroscedasticity = nonconstant error variance. Homoscedasticity = constant error variance.

• Note: LS estimator 𝐛 remains unbiased and consistent even with nonconstant variance.
• Its efficiency is impaired (we can do better) and usual formulas for standard errors are inaccurate.

• Harm produced by heteroscedasticity is relatively mild. Worry if the largest variance is 4 times the
smallest variance (i.e., sd of the errors varies by more than a factor 2).
Graphical check of constant variance

• Most important plot: Residual plots - 𝐞 vs 𝑦ො

Graphical check of constant variance: Example

• Plot residuals 𝐸𝑖 against fitted values 𝑌෠𝑖

(not 𝑌).
• Check for constant variance in vertical
direction, and scatter should be symmetric
vertically about 0.

• Plot residuals against each 𝑋 (included or

excluded).
Graphical check of constant variance: Example

• Plot studentized residuals 𝐸𝑖∗

• Ordinary residuals have unequal variances,

even with constant error variance.

• Pattern of changing spread more easily

seen by plotting |𝐸𝑖∗ | or 𝐸𝑖∗2 against 𝑌.
෠

• First row: not transformed.

• Second row: log-transformed.

Graphical check of constant variance: Example

• R provides default residual diagnostics with

plot() function.

1. 𝐸𝑖 versus 𝐲ො𝑖

2. Normal QQ-plot for 𝐸𝑖′

3. |𝐸𝑖′ | versus 𝐲ො𝑖

4. 𝐸𝑖′ versus ℎ𝑖
Nonlinearity

• 𝐸(𝜖) = 0 implies that regression surface accurately reflects the dependency 𝐸(𝑌|𝑋𝑖 ).

• Regression surface is generally high dimensional.

• Focus on certain patterns of departure from linearity.

• Graphical diagnostics։ cloud plotting or 𝑌 vs 𝑋𝑖 .

• The residual based plots maybe more informative.

• Monotone and non-monotone nonlinearity.

Component-Plus-Residual Plots (partial residual plots)

Residual of the main regression model

• Partial residual for 𝑗-th regressor

𝑗
𝐸𝑖 = 𝐸𝑖 + 𝐵𝑗 𝑋𝑖𝑗 :

i.e., add back linear component of partial relationship between 𝑌 and 𝑋𝑗 to 𝐸𝑖 .

• Plot 𝐸 (𝑗) versus 𝑋𝑗 .

• Multiple regression coefficient 𝐵𝑗 is the slope of simple regression of 𝐸 (𝑗) on 𝑋𝑗 .

• Nonlinearity may be apparent in the plot.

Component-Plus-Residual Plots: Example

• SLID regression: the solid lines show the lowess smooths, the broken lines are least-squares fits.
Overview

• Checking assumptions in Linear Model

• Transformations
• Polynomials and splines
Transformations (response variable)

• Variable transformation may help to address possible violations of the assumptions:

• Log transformation 𝑌 → ln 𝑌

• Power transformation: 𝑌 → 𝑌 𝜆 (parameter of transformation)

• We can obtain 𝜆መ 𝑀𝐿𝐸 using the Maximum Likelihood Estimation.

Which 𝜆0 means no transformation?
• Check the hypothesis

𝐻0 ∶ 𝜆 = 𝜆0 .

• Likelihood Ratio Tests, Wald test, Score test.

Transformations: Box-Cox

• The aim of the Box-Cox transformations is to ensure the usual assumptions for Linear Model hold.
(𝜆)
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖1 + ⋯ + 𝛽𝑘 𝑋𝑖𝑘 + 𝜖𝑖 ,

where
𝑌𝑖𝜆 − 1
=൞ 𝜆 , for 𝜆 ≠ 0
(𝜆)
𝑌𝑖
ln 𝑌𝑖 , for 𝜆 = 0

• Log-transformation is a particular case of Box-Cox (i.e., 𝜆 = 0).

• Which 𝜆 means no transformation?

Maximum Likelihood Estimation

• SLID regression:

• We can see a strong reason to transform (why?).

CI0.95 (𝜆)
Overview

• Checking assumptions in Linear Model

• Transformations
• Polynomials and splines
Transformations (predictors)

• Generalizing the 𝐗𝜷 part of the model by adding polynomial terms (e.g., one-predictor case):
𝑦 = 𝛽0 + 𝛽1 𝑋 + ⋯ + 𝛽𝑑 𝑋 𝑑 + 𝜖

• Selection of 𝑑.
1. Keep adding terms until the added term is not statistically significant.
2. Start with a large 𝑑 , eliminate not significant terms starting with the highest order term.

• Polynomial regression allows for more flexible relationship

• Principle of marginality: do not remove lower order terms from model, even if they are not statistically
significant.
Polynomial regression
Orthogonal polynomials

• When a term is removed from or added to the model, coefficients change and model needs to be refitted.

• High order polynomial models may be numerically unstable.

• Orthogonal polynomials may help:

• replace old set of predictors 𝑋, 𝑋 2 , 𝑋 3 … by new, orthogonal, set of predictors 𝑍1 , 𝑍2 , 𝑍3 …

𝑍1 = 𝑎1 + 𝑏1 𝑋
𝑍2 = 𝑎2 + 𝑏2 𝑋 + 𝑐2 𝑋 2
𝑍3 = 𝑎3 + 𝑏3 𝑋 + 𝑐3 𝑋 2 + 𝑑3 𝑋 3

Such that 𝑍𝑖′ 𝑍𝑗 = 0 and 𝑍𝑖′ 𝑍𝑖 = 1

Orthogonal polynomials: Example

The poly() function constructs

Orthogonal polynomials

We come to the same coefficients

Orthogonal polynomials 𝑍𝑖 s are

indeed orthogonal.
Regression splines

• A spline is a piecewise polynomial with a certain level of smoothness. Spline fixes the disadvantages of
Polynomial regression by combining it with Segmented regression (see more on practical session).

• Splines use B-spline basis functions.

• Define cubic B-spline basis 𝑆(𝑋) (defined over [𝑎; 𝑏]) using knots at 𝑡1 , … , 𝑡𝑘

• 𝑆 𝑋 , 𝑆 ′ (𝑋), 𝑆 ′′ (𝑋) are continuous on [𝑎; 𝑏]

• Partition 𝑎 = 𝑡0 < 𝑡1 < · · · < 𝑡𝑘 = 𝑏 , function 𝑆(X) is cubic on each subinterval [𝑡𝑖 , 𝑡𝑖+1 ], i.e.,
𝑆𝑖 𝑋 = 𝑎0,𝑖 + 𝑎1,𝑖 𝑋 + 𝑎2,𝑖 𝑋 2 + 𝑎3,𝑖 𝑋 3

• How many unknowns are there?

Regression splines: Example

• Suppose we know the true model is:

𝑦 = sin3 2𝜋𝑥 3 + 𝜖, with 𝜖~𝑁(0; 0.12 )

Regression splines: Example

• Now, let’s use splines with 12 basis functions.

• Hint: Place more knots in places where the function might

vary rapidly and fewer knows where it seems more stable.

QT Chapter 4
No ratings yet
QT Chapter 4
6 pages
Sample Econometrics Research Paper
No ratings yet
Sample Econometrics Research Paper
14 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
C1M5 Peer Reviewed Others
No ratings yet
C1M5 Peer Reviewed Others
27 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
00000chen - Linear Regression Analysis3
No ratings yet
00000chen - Linear Regression Analysis3
252 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Classical LinearReg 000
No ratings yet
Classical LinearReg 000
41 pages
7 OLS Assumptions
No ratings yet
7 OLS Assumptions
37 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression
No ratings yet
ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression
38 pages
Lecture 09_02.09.2024_Regression-01
No ratings yet
Lecture 09_02.09.2024_Regression-01
62 pages
CH 2
No ratings yet
CH 2
31 pages
US - TMC - 06 - Curve Fitting & Interpolation
No ratings yet
US - TMC - 06 - Curve Fitting & Interpolation
64 pages
ch4 2
No ratings yet
ch4 2
36 pages
Chapter three
No ratings yet
Chapter three
35 pages
LR Assumptions_05
No ratings yet
LR Assumptions_05
12 pages
5 Curve Fitting and Interpolation
No ratings yet
5 Curve Fitting and Interpolation
20 pages
Lecture Notes on High Dimensional Linear Regression
No ratings yet
Lecture Notes on High Dimensional Linear Regression
73 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Regression_Questionnaire
No ratings yet
Regression_Questionnaire
10 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Lecture 8
No ratings yet
Lecture 8
25 pages
Module01.1 LinearRegression
No ratings yet
Module01.1 LinearRegression
32 pages
13 Chapter14
No ratings yet
13 Chapter14
28 pages
LR Assumptions
No ratings yet
LR Assumptions
9 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
Linear Regression: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
No ratings yet
Linear Regression: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
39 pages
unit5_R
No ratings yet
unit5_R
5 pages
Linear Regression Assumptions and Limitations
No ratings yet
Linear Regression Assumptions and Limitations
10 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
5_AML Lecture 5_Linear regression
No ratings yet
5_AML Lecture 5_Linear regression
56 pages
Regression
No ratings yet
Regression
45 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Assumption Checking On Linear Regression
No ratings yet
Assumption Checking On Linear Regression
65 pages
Lec2 ASE
No ratings yet
Lec2 ASE
86 pages
Unit Iii
No ratings yet
Unit Iii
27 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
2.3 Assumptions of Linear Regression
No ratings yet
2.3 Assumptions of Linear Regression
16 pages
Linear_Regression (1)
No ratings yet
Linear_Regression (1)
35 pages
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
No ratings yet
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
16 pages
Topic 7 Linear Regreation CHP14
No ratings yet
Topic 7 Linear Regreation CHP14
21 pages
2 Linear
No ratings yet
2 Linear
15 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
Linear Regression Analysis: Module - Iv
No ratings yet
Linear Regression Analysis: Module - Iv
10 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
UE20CS312 Unit2 Slides
No ratings yet
UE20CS312 Unit2 Slides
206 pages
StatLearning2r PDF
No ratings yet
StatLearning2r PDF
267 pages
Estad Istica II Chapter 5. Regression Analysis (Second Part)
No ratings yet
Estad Istica II Chapter 5. Regression Analysis (Second Part)
39 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Lecture 11 Regression
No ratings yet
Lecture 11 Regression
53 pages
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
No ratings yet
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
7 pages
Day 7 Session 1 MLR Influence Statistics
No ratings yet
Day 7 Session 1 MLR Influence Statistics
16 pages
Introduction To Statistical Methods: BITS Pilani
No ratings yet
Introduction To Statistical Methods: BITS Pilani
40 pages
Lecture 05 - Linear Regression
No ratings yet
Lecture 05 - Linear Regression
12 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
Unit II - Diagnotis and Multiple Linear
No ratings yet
Unit II - Diagnotis and Multiple Linear
8 pages
14 Statistics and Probability
No ratings yet
14 Statistics and Probability
37 pages
LM Week1 1 2019
No ratings yet
LM Week1 1 2019
28 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
FDS L1 To L8 Slides
No ratings yet
FDS L1 To L8 Slides
143 pages
A Simple Test For Heteroscedasticity and Random Coefficient Variation (Breusch y Pagan)
No ratings yet
A Simple Test For Heteroscedasticity and Random Coefficient Variation (Breusch y Pagan)
9 pages
Lectures 7 8-Simple Regression Analysis - Assumptions and Estimations (OLS)
No ratings yet
Lectures 7 8-Simple Regression Analysis - Assumptions and Estimations (OLS)
21 pages
Business Statistics
No ratings yet
Business Statistics
28 pages
Econometrics 2
No ratings yet
Econometrics 2
135 pages
Two-Variable Regression Model - The Problem of Estimation
No ratings yet
Two-Variable Regression Model - The Problem of Estimation
35 pages
Trachnhiem
100% (1)
Trachnhiem
4 pages
1 +Universitas+Trisakti+ (01-09)
No ratings yet
1 +Universitas+Trisakti+ (01-09)
9 pages
FAQ - ReCell
No ratings yet
FAQ - ReCell
5 pages
Environmental Sustainability Accounting and The Performance of Oil and Gas Companies in Rivers State, Nigeria
No ratings yet
Environmental Sustainability Accounting and The Performance of Oil and Gas Companies in Rivers State, Nigeria
11 pages
Thampanya 2020
No ratings yet
Thampanya 2020
26 pages
Lec05 Regressionasymptotics
No ratings yet
Lec05 Regressionasymptotics
34 pages
Complete Download SPSS Statistics: A Practical Guide 5e 5th Edition Kellie Bennett - eBook PDF PDF All Chapters
100% (4)
Complete Download SPSS Statistics: A Practical Guide 5e 5th Edition Kellie Bennett - eBook PDF PDF All Chapters
69 pages
Test On Heteroschedasticity Econometrics
No ratings yet
Test On Heteroschedasticity Econometrics
20 pages
Regression Problems (Practical)
No ratings yet
Regression Problems (Practical)
24 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
Developing Personalized Marketing Service Using Generative AI
No ratings yet
Developing Personalized Marketing Service Using Generative AI
9 pages
Model Exit Exam For Agricultural Economics
No ratings yet
Model Exit Exam For Agricultural Economics
21 pages
Zayyad Chapter Three Edited
No ratings yet
Zayyad Chapter Three Edited
4 pages
Purple and Yellow Minimalist The Role of Foreign Investment in Economic Growth Presentation (3)
No ratings yet
Purple and Yellow Minimalist The Role of Foreign Investment in Economic Growth Presentation (3)
17 pages
Alamanda 2021 The Effect of Economic Growth On Income Inequality Panel Data From 50 Countries
No ratings yet
Alamanda 2021 The Effect of Economic Growth On Income Inequality Panel Data From 50 Countries
10 pages
The Simple Regression Model
No ratings yet
The Simple Regression Model
10 pages
A Summary of Introductory Econometrics by Wooldridge: SSRN Electronic Journal January 2015
No ratings yet
A Summary of Introductory Econometrics by Wooldridge: SSRN Electronic Journal January 2015
62 pages
Jab Ps 2126711641247200
No ratings yet
Jab Ps 2126711641247200
11 pages
Oversikt ECN402
No ratings yet
Oversikt ECN402
40 pages
Heteroskedasticity
No ratings yet
Heteroskedasticity
49 pages
The Influence of Lifestyle, Financial Literacy, and Social Demographics On Consumptive Behavior
No ratings yet
The Influence of Lifestyle, Financial Literacy, and Social Demographics On Consumptive Behavior
9 pages
Andreitatulici, 9 Effects of Audit
No ratings yet
Andreitatulici, 9 Effects of Audit
10 pages