0% found this document useful (0 votes)

3 views

Tutorial-4

Uploaded by

sahrish.khan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Tutorial-4

Uploaded by

sahrish.khan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Tutorial 4

Sahrish Aisha Khan

2024-12-01
library(readxl)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse

2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ──────────────────────────────────────────
tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r -lib.org/>) to force all
conflicts to become errors

library(modelsummary)

## `modelsummary` 2.0.0 now uses `tinytable` as its default table -drawing

## backend. Learn more at: https://vincentarelbundock.github.io/tinytable/
##
## Revert to `kableExtra` for one session:
##
## options(modelsummary_factory_default = 'kableExtra')
## options(modelsummary_factory_latex = 'kableExtra')
## options(modelsummary_factory_html = 'kableExtra')
##
## Silence this message forever:
##
## config_modelsummary(startup_message = FALSE)

library(ggfortify)
library(car)

## Warning: package 'car' was built under R version 4.4.2

## Loading required package: carData

##
## Attaching package: 'car'
##
## The following object is masked from 'package:dplyr':
##
## recode
##
## The following object is masked from 'package:purrr':
##
## some

library(estimatr)
library(lmtest)

## Loading required package: zoo

##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric

library(stargazer)

##
## Please cite as:
##
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary
Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer

airquality <- read_excel("airquality.xlsx")

airquality <- data.frame(airquality)

a. Estimate a linear regression model that explains airq from the other variables using
OLS.

#Estimate a linear regression model

reg1 <- lm(airq ~ vala + rain + coas + dens + medi, data = airquality)

b. Test the null hypothesis that average income does not affect the airquality. Test the joint
hypothesis that none of the variables has an effect upon airquality.

#null hypothesis that average income does not affect the airquality
summary(reg1)

##
## Call:
## lm(formula = airq ~ vala + rain + coas + dens + medi, data = airquality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.958 -9.891 -6.173 13.714 69.430
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.119e+02 1.533e+01 7.301 1.53e-07 ***
## vala 8.834e-04 2.256e-03 0.392 0.6989
## rain 2.507e-01 3.435e-01 0.730 0.4726
## coas -3.340e+01 1.046e+01 -3.194 0.0039 **
## dens -1.073e-03 1.623e-03 -0.661 0.5148
## medi 5.545e-04 8.503e-04 0.652 0.5205
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24.2 on 24 degrees of freedom
## Multiple R-squared: 0.3829, Adjusted R-squared: 0.2544
## F-statistic: 2.979 on 5 and 24 DF, p-value: 0.03133

average income coefficient is not significantly different from zero all vars jointly significant
at 5% level (F-Stat=2.9 with p- value=0.03133) This implies that At least one of the
regressors (variables) has a statistically significant effect on air quality. Note that: none of
these tests is reliable if there is heteroskedasticity since standard errors would have been
calculated incorrectly in that case
c. Use the command autoplot to graphically inspect the residuals; is there any sign of
heteroskedasticity?

#Graphical inspection of residulas to check for heteroskedacity

autoplot(reg1) +
theme_bw(base_size=14)
plot(reg1)
Yes, we can observe fanning in behaviour, hence, potentially heteroskedastic

d. Perform a Breusch-Pagan test for heteroskedasticity related to all five explanatory

variables.
bptest(model)

#BP TEST
bptest(reg1)

##
## studentized Breusch-Pagan test
##
## data: reg1
## BP = 3.1416, df = 5, p-value = 0.6782

Fail to reject the null hypothesis that The residuals are homoscedastic. Since the p-value is
not less than 0.05, We do not have sufficient evidence to say that heteroscedasticity is
present in the regression model.
# you can also define the z vars heteroskedasticity depends on, for example:
bptest(reg1, ~ vala + rain, data = airquality)

##
## studentized Breusch-Pagan test
##
## data: reg1
## BP = 0.19782, df = 2, p-value = 0.9058

Here, instead of testing heteroskedasticity across all predictors, the test specifically
checks if the variance of residuals depends on these two variables. Again, Since the p-
value is not less than 0.05, We do not have sufficient evidence to say that
heteroscedasticity is present in the regression model with respect to variables, vala and
rain.

e. Perform a White test for heteroskedasticity. How reliable is this test given that we have
30 observations and how many degrees of freedom on the chi-square distribution?????
bptest(model, ~ all vars that needs to be included, data=dataset)

#WHITE TEST
bptest(reg1, ~ vala + rain + coas + dens + medi + I(vala^2) +
I(rain^2) + I(dens^2)+ I(medi^2), data = airquality)

##
## studentized Breusch-Pagan test
##
## data: reg1
## BP = 5.4355, df = 9, p-value = 0.7948

The White test examines heteroskedasticity in residuals by allowing the variance to depend
on all predictors and their squares. It is a more general and robust test compared to the
Breusch-Pagan test. Since p value is higher than 0.05, we Fail to reject the null hypothesis
This suggests no evidence of heteroskedasticity in the model.
The White test is generally reliable for larger sample sizes, as it relies on asymptotic
properties (large-sample approximations). With only 30 observations, the test results may
be less reliable due to insufficient power to detect heteroskedasticity accurately.

f. Perform a Goldfeld-Quandt test for equality of the variance between smsa on

the coast (coas=1) and not (coas=0)

- first find the percentage of observations with coas=0

summary(airquality$coas)
- Then use this command
gq_test <- gqtest(model, order.by = ~ coas, point=0.3, data = airquality)
print(gq_test)

how do you interpret the results? Can you understand why df1 = 15, df2 = 3?

#Goldfeld-Quandt test
summary(airquality$coas)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 0.0 0.0 1.0 0.7 1.0 1.0

gq_test <- gqtest(reg1, order.by = ~ coas, point=0.3, data =

airquality)
print(gq_test)

##
## Goldfeld-Quandt test
##
## data: reg1
## GQ = 27.244, df1 = 15, df2 = 3, p-value = 0.009802
## alternative hypothesis: variance increases from segment 1 to 2

Since p value is less than 0.05, we reject the the null hypothesis. Therefore, There is
significant evidence of heteroskedasticity. Specifically, the variance of residuals increases
between the two groups
df=Number of observations in the group − Number of estimated parameters(k)
# suppose the order.by variable was continuous, for example medi
gq_test2 <- gqtest(reg1, order.by = ~ medi, point=0.5, fraction=0.2,
data = airquality)
# point=0.5 means split sample evenly
# fraction=0.2 20% of observations are excluded from the middle of the sorted
dataset, leaving two smaller subsets for the comparison of variances.
print(gq_test2)

##
## Goldfeld-Quandt test
##
## data: reg1
## GQ = 7.1341, df1 = 6, df2 = 6, p-value = 0.01532
## alternative hypothesis: variance increases from segment 1 to 2

##
## Goldfeld-Quandt test
##
## data: model
## GQ = 7.1341, df1 = 6, df2 = 6, p-value = 0.01532
## alternative hypothesis: variance increases from segment 1 to 2

g. Both BP and White test have quite a generic alternative hypothesis and have therefore a low
power. Assume now that heteroskedasticity is multiplicative.
Plot the residuals against the various regressors (one at a time) to see
what variables heteroskedasticity may be depending upon.

plot(airquality$vala, res) for example

#Plot the residuals against the various regressors (one at a time) to see
what variables heteroskedasticity may be depending upon.
res <- residuals(reg1)
plot(airquality$vala, res)
plot(airquality$rain, res)

plot(airquality$dens, res)
plot(airquality$medi, res)

plot(airquality$coas, res)
clear
heteroskedasticity only in relation to the variable coas, but for other vars there are too few
observations for high values of the x var

h. Assume that multiplicative heteroskedasticity is related to coas and medi.

Estimate the coefficients by running a regression of 𝑙𝑜𝑔𝜀̂𝑖2 upon these two variables. Test the
null hypothesis of homoskedasticity on the basis of this auxiliary regression.

#Multiplicative Heteroskedasticity Test

res_sq <- (res^2)
aux.multip <- lm(log(res_sq) ~ coas + medi, data=airquality)
summary(aux.multip)

##
## Call:
## lm(formula = log(res_sq) ~ coas + medi, data = airquality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1050 -0.8973 0.1499 0.9418 2.9630
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.437e+00 5.271e-01 8.418 4.98e-09 ***
## coas 1.572e+00 6.149e-01 2.556 0.0165 *
## medi -5.288e-05 2.293e-05 -2.306 0.0290 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.521 on 27 degrees of freedom
## Multiple R-squared: 0.2731, Adjusted R-squared: 0.2192
## F-statistic: 5.072 on 2 and 27 DF, p-value: 0.01349

here, auxiliary regression is used to text multiplicative heteroskedacity Here we are

assuming a specific functional form and restricting to two variables in the z vector. Both
variables (coas and medi) have p-values less than 0.05, meaning they are statistically
significant in explaining heteroskedasticity. The F-statistic is 5.072 with a p-value =
0.01349 (< 0.05), indicating the model as a whole is statistically significant in explaining the
variance of residuals. Testing for joint significance of both coefficients, you reject null of
homoskedastic errors at 5%, not at 1%.

i. Based on the test in h. compute the FGLS estimator for the linear model. Compare the results
with the OLS estimates.

In general for the FGLS estimator:

res_sq <- (res^2)

# Step 1: run auxiliary regression (estimate model for variance in log)

aux.variance <- lm(log(res_sq)~ x1 + x2, data=dataset)

# Step 2: obtain estimate of the individual variance

variance <- exp(fitted(aux.variance))

# Step 3: estimate the model using lm and specify weights

model_fgls <- lm(y ~ x1 + x2 + x3, weights=1/variance, data=dataset)

Note: we have already done up to and including step 1 when answering h.

stargazer(model, model_fgls,
type="text",
no.space=TRUE)

#FGLS ESTIMATOR
variance <- exp(fitted(aux.multip))
# Step 3: estimate the model using lm and specify weights
model_fgls <- lm(airq ~ vala + rain + coas + dens + medi,
weights=1/variance, data=airquality)
stargazer(reg1, model_fgls,
type="text",
no.space=TRUE)

##
## ==========================================================
## Dependent variable:
## ----------------------------
## airq
## (1) (2)
## ----------------------------------------------------------
## vala 0.001 0.0001
## (0.002) (0.001)
## rain 0.251 0.165
## (0.344) (0.300)
## coas -33.398*** -32.647***
## (10.458) (7.744)
## dens -0.001 -0.001
## (0.002) (0.001)
## medi 0.001 0.001*
## (0.001) (0.0004)
## Constant 111.935*** 115.794***
## (15.332) (11.715)
## ----------------------------------------------------------
## Observations 30 30
## R2 0.383 0.602
## Adjusted R2 0.254 0.519
## Residual Std. Error (df = 24) 24.203 1.628
## F Statistic (df = 5; 24) 2.979** 7.262***
## ==========================================================
## Note: *p<0.1; **p<0.05; ***p<0.01

Note that now medi is somewhat significant at 6% no difference in significance in the other
regressors R-square is inflated. It measures the variation of transformed variable y as
explained by the model not the variation of y (values with high h are less weighted by the
model)

j. Obtain the robust standard errors. You need to install estimatr and car
model_robust <- lm_robust(y ~ x1 + x2 + x3, data=dataset)
modelsummary(list(model,model_fgls,model_robust), stars=TRUE)

#Robust Standard Errors

model_robust <- lm_robust(airq ~ vala + rain + coas + dens + medi,
data=airquality)
modelsummary(list(reg1,model_fgls,model_robust), stars=TRUE)

(Intercept) 111.935* 115.794* 111.935***

(15.332) (11.715) (12.646)
vala 0.001 0.000 0.001
(0.002) (0.001) (0.002)
rain 0.251 0.165 0.251
(0.344) (0.300) (0.334)
coas -33.398** -32.647*** -33.398***
(10.458) (7.744) (7.415)
dens -0.001 -0.001 -0.001
(0.002) (0.001) (0.001)
medi 0.001 0.001+ 0.001
(0.001) (0.000) (0.000)
Num.Obs. 30 30 30
R2 0.383 0.602 0.383
R2 Adj. 0.254 0.519 0.254
AIC 283.6 272.8 283.6
BIC 293.4 282.6 293.4
Log.Lik. -134.815 -129.378
F 2.979 7.262
RMSE 21.65 21.81 21.65
• p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

OLS Provides unbiased coefficients but unreliable standard errors under

heteroskedasticity. FGLS Accounts for heteroskedasticity by reweighting observations
based on their variance (using a multiplicative heteroskedasticity model).It Provides more
efficient estimates under heteroskedasticity but assumes the variance model is correctly
specified. Robust Standard errors Uses OLS coefficients but computes heteroskedasticity-
robust standard errors. This Provides valid inference without assuming a specific variance
model but may be less efficient than FGLS if the heteroskedasticity model is correct.

k. Obtain the bootstrapped s.e . You need to install sandwich

coeftest(model, vcov = vcovBS(model))

#Bootstraped Standard Errors

library(sandwich)

## Warning: package 'sandwich' was built under R version 4.4.2

coeftest(reg1, vcov = vcovBS(reg1))

##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.1193e+02 1.2151e+01 9.2118 2.382e-09 ***
## vala 8.8339e-04 3.4095e-03 0.2591 0.7977709
## rain 2.5070e-01 3.3457e-01 0.7493 0.4609464
## coas -3.3398e+01 8.5125e+00 -3.9235 0.0006392 ***
## dens -1.0734e-03 3.2379e-03 -0.3315 0.7431462
## medi 5.5449e-04 1.4824e-03 0.3740 0.7116556
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Bootstrapping is non-parametric and doesn’t assume a specific error distribution. It is

particularly useful when classical methods (e.g., heteroskedasticity-robust covariance
estimators) might fail or when the sample size is small.

Spatial Econometrics Methods and Models
No ratings yet
Spatial Econometrics Methods and Models
14 pages
Stats216 hw3 PDF
No ratings yet
Stats216 hw3 PDF
26 pages
Interpolation - R Spatial
No ratings yet
Interpolation - R Spatial
16 pages
Bioestadistica: Clara Carner 2023-05-29
No ratings yet
Bioestadistica: Clara Carner 2023-05-29
4 pages
Dosis Respuesta R
No ratings yet
Dosis Respuesta R
11 pages
R thing
No ratings yet
R thing
9 pages
MBS-14 Akmal Shahzad Econo 3rd Assignment
No ratings yet
MBS-14 Akmal Shahzad Econo 3rd Assignment
4 pages
Multicollinearity and Oaxaca -Tutorial
No ratings yet
Multicollinearity and Oaxaca -Tutorial
35 pages
Homework #6
No ratings yet
Homework #6
16 pages
BZAN 535: Linear Regression
No ratings yet
BZAN 535: Linear Regression
11 pages
Anova Categorical Data
No ratings yet
Anova Categorical Data
5 pages
20BCE1205 Lab6
No ratings yet
20BCE1205 Lab6
12 pages
Tutorial-5
No ratings yet
Tutorial-5
12 pages
Homework 2
100% (1)
Homework 2
14 pages
Analysis Course HW5
No ratings yet
Analysis Course HW5
7 pages
813117850Math2275Assignment5
No ratings yet
813117850Math2275Assignment5
15 pages
R Practice
No ratings yet
R Practice
38 pages
Panel 2
No ratings yet
Panel 2
26 pages
Coding Analisis Spasial
No ratings yet
Coding Analisis Spasial
9 pages
Statistical Learning in R
No ratings yet
Statistical Learning in R
31 pages
Week 10 Abhishek Srivastava VFinal
No ratings yet
Week 10 Abhishek Srivastava VFinal
14 pages
Ancova in R Handout
No ratings yet
Ancova in R Handout
8 pages
Count Models in JAGS
No ratings yet
Count Models in JAGS
16 pages
Tatuzinho
No ratings yet
Tatuzinho
7 pages
Statistical Computing by Using R
100% (1)
Statistical Computing by Using R
11 pages
Multiple Linear Regression
100% (1)
Multiple Linear Regression
14 pages
Maths Record Output .
No ratings yet
Maths Record Output .
24 pages
Regression in R
No ratings yet
Regression in R
40 pages
Assignment-3-Forecasting-
No ratings yet
Assignment-3-Forecasting-
12 pages
Linear Regression in R
No ratings yet
Linear Regression in R
19 pages
Code With Queries - Solved
No ratings yet
Code With Queries - Solved
10 pages
Using The MarkowitzR Package
No ratings yet
Using The MarkowitzR Package
12 pages
Swoboda_A06_CH11_SimpleRegression_Template
No ratings yet
Swoboda_A06_CH11_SimpleRegression_Template
11 pages
Map Assign 8
No ratings yet
Map Assign 8
7 pages
Intro To Statistic Using R - Session 1
No ratings yet
Intro To Statistic Using R - Session 1
1 page
WEEK
No ratings yet
WEEK
17 pages
#Practico R Intervalos de Confianza
No ratings yet
#Practico R Intervalos de Confianza
2 pages
Muthayammal College of Arts and Science Rasipuram: Assignment No - 3
No ratings yet
Muthayammal College of Arts and Science Rasipuram: Assignment No - 3
8 pages
Antlion bs2460
No ratings yet
Antlion bs2460
3 pages
ps4lab
No ratings yet
ps4lab
4 pages
Ancova: R Markdown
No ratings yet
Ancova: R Markdown
6 pages
Regression hw3
No ratings yet
Regression hw3
3 pages
Cureplots
No ratings yet
Cureplots
7 pages
HW6 Solution
No ratings yet
HW6 Solution
10 pages
Practical Machine Learning
No ratings yet
Practical Machine Learning
11 pages
Individual Variable Data Analysis: Warning
No ratings yet
Individual Variable Data Analysis: Warning
38 pages
Chapter 3 Homework (Take 2)
No ratings yet
Chapter 3 Homework (Take 2)
7 pages
Regressione Logistica1
No ratings yet
Regressione Logistica1
8 pages
Lab kamal sir
No ratings yet
Lab kamal sir
5 pages
7 K-Means Clustering
No ratings yet
7 K-Means Clustering
27 pages
Machine Learning and Pattern Recognition Minimal GP Demo
No ratings yet
Machine Learning and Pattern Recognition Minimal GP Demo
3 pages
R Homework
No ratings yet
R Homework
13 pages
LabNote 3
No ratings yet
LabNote 3
3 pages
R Companion - One-Way Anova
No ratings yet
R Companion - One-Way Anova
11 pages
20BCE1205 Lab3
No ratings yet
20BCE1205 Lab3
9 pages
STATS-10-Assignment-4
No ratings yet
STATS-10-Assignment-4
9 pages
Exame-do-dia-31-01-2020
No ratings yet
Exame-do-dia-31-01-2020
7 pages
HW2 Solution
No ratings yet
HW2 Solution
7 pages
R Companion - Chi-Square Test of Goodness-of-Fit
No ratings yet
R Companion - Chi-Square Test of Goodness-of-Fit
7 pages
MakeUpCat
No ratings yet
MakeUpCat
6 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Jitendra Hirwani: Daily Practice Problem OF Physical Chemistry For Neet
No ratings yet
Jitendra Hirwani: Daily Practice Problem OF Physical Chemistry For Neet
7 pages
Chapter 2 Econometric
No ratings yet
Chapter 2 Econometric
28 pages
Public Health Intelligence
No ratings yet
Public Health Intelligence
6 pages
Reading 2 Time-Series Analysis - Answers
No ratings yet
Reading 2 Time-Series Analysis - Answers
63 pages
Population Genetics
No ratings yet
Population Genetics
49 pages
Econometric Si Syl Lab Us
No ratings yet
Econometric Si Syl Lab Us
5 pages
Reading 2 Time-Series Analysis
No ratings yet
Reading 2 Time-Series Analysis
47 pages
Uji Asumsi Klasik
No ratings yet
Uji Asumsi Klasik
6 pages
Kinetic Theory Worksheet
No ratings yet
Kinetic Theory Worksheet
3 pages
R Square 30%
No ratings yet
R Square 30%
10 pages
NLP Programming en 01 Unigramlm
No ratings yet
NLP Programming en 01 Unigramlm
28 pages
Book No. 121, Page. 330 (The Physical Basis of Thermodynamics: With Applications To Chemistry by Pascal Richet)
No ratings yet
Book No. 121, Page. 330 (The Physical Basis of Thermodynamics: With Applications To Chemistry by Pascal Richet)
1 page
Online Correlation and Regression
No ratings yet
Online Correlation and Regression
6 pages
Regression Analysis Formula Excel Template
No ratings yet
Regression Analysis Formula Excel Template
5 pages
Product Demand Estimation Forecasting Capacity OT Prod RT Prod Out Sourcing Production Plan
No ratings yet
Product Demand Estimation Forecasting Capacity OT Prod RT Prod Out Sourcing Production Plan
8 pages
Time Series Analysis Template
No ratings yet
Time Series Analysis Template
5 pages
Jurnal - Hamzah Qattrunnada
No ratings yet
Jurnal - Hamzah Qattrunnada
6 pages
Part 8
No ratings yet
Part 8
17 pages
Single Phase Diagram - Real Gases - 003
No ratings yet
Single Phase Diagram - Real Gases - 003
36 pages
10 Gases 2b PDF
No ratings yet
10 Gases 2b PDF
10 pages
Group 2
No ratings yet
Group 2
24 pages
Nominal Vs Linear Regression
No ratings yet
Nominal Vs Linear Regression
17 pages
2-Real Gases Lecture
50% (2)
2-Real Gases Lecture
38 pages
Different Types of Regression Models
No ratings yet
Different Types of Regression Models
18 pages
Day93 94 Diabetes Prediction Model
No ratings yet
Day93 94 Diabetes Prediction Model
27 pages
Cosm QB-4
No ratings yet
Cosm QB-4
3 pages
Econometrics Exam, 11,11,12
100% (1)
Econometrics Exam, 11,11,12
2 pages
Estimation
No ratings yet
Estimation
4 pages
5 - Logistic Regression
No ratings yet
5 - Logistic Regression
19 pages