ComandosR RLS

##############################################################################
CE071: ANALISE DE REGRESSAO LINEAR PROFa: SUELY RUIZ GIOLO
CODIGOS R VERSAO 2.10

EXEMPLOS: REGRESSÃO LINEAR SIMPLES
##############################################################################
=====================================================================
EXEMPLO 1 Fonte: Bussab (1988).
Y = tempo de reação a certo estímulo (em segundos)
X = idade (em anos)
=====================================================================
ex1<-read.table("http://www.ufpr.br/~giolo/CE071/Exemplos/Exemplo1s.txt",h=T)
attach(ex1)
ex1
tempo idade
1 96 20
2 92 20
3 106 20
4 100 20
5 98 25
6 104 25
7 110 25
8 101 25
9 116 30
10 106 30
11 109 30
12 100 30
13 112 35
14 105 35
15 118 35
16 108 35
17 113 40
18 112 40
19 127 40
20 117 40
x<-idade
y<-tempo
plot(x,y, pch=16, col=4)
points(c(20,25,30,35,40),c(98.5,103.25,107.75,110.75,117.25), pch=8, col=2)
cor(x,y)
[1] 0.7680814
cor.test(x,y)
Pearson's product-moment correlation
data: x and y
t = 5.0889, df = 18, p-value = 7.662e-05
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.4931929 0.9035073
sample estimates:
cor
0.7680814
mod1<-lm(y~x)
anova(mod1)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x 1 810.00 810.00 25.897 7.662e-05 ***
Residuals 18 563.00 31.28
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
summary(mod1)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-7.500 -4.125 -0.750 2.625 10.500
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 80.5000 5.4510 14.768 1.67e-11 ***
x 0.9000 0.1769 5.089 7.66e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.593 on 18 degrees of freedom

Multiple R-squared: 0.5899, Adjusted R-squared: 0.5672
F-statistic: 25.9 on 1 and 18 DF, p-value: 7.662e-05
x1<-20:40
ye<-80.5+0.9*x1
lines(x1,ye)
par(mfrow=c(2,2))
plot(mod1, which=c(1:4), add.smooth=FALSE, pch=20)
shapiro.test(mod1$resid)
Shapiro-Wilk normality test
W = 0.9376, p-value = 0.2161
Obs: Como para cada valor de x nesse exemplo há mais de um y observado (repetições),
é possível utilizar o teste de Bartlett para testar a homogeneidade de variâncias.
bartlett.test(y~x)
Bartlett test of homogeneity of variances
data: y by x
Bartlett's K-squared = 0.2987, df = 4, p-value = 0.9899
predict(lm(y~x),interval="confidence", se.fit=T, level=0.95)
$fit
fit lwr upr
1 98.5 93.94935 103.0507
2 98.5 93.94935 103.0507
3 98.5 93.94935 103.0507
4 98.5 93.94935 103.0507
5 103.0 99.78220 106.2178
6 103.0 99.78220 106.2178
7 103.0 99.78220 106.2178
8 103.0 99.78220 106.2178
9 107.5 104.87268 110.1273
10 107.5 104.87268 110.1273
11 107.5 104.87268 110.1273
12 107.5 104.87268 110.1273
13 112.0 108.78220 115.2178
14 112.0 108.78220 115.2178
15 112.0 108.78220 115.2178
16 112.0 108.78220 115.2178
17 116.5 111.94935 121.0507
18 116.5 111.94935 121.0507
19 116.5 111.94935 121.0507
20 116.5 111.94935 121.0507
$se.fit
[1] 2.166026 2.166026 2.166026 2.166026 1.531611 1.531611 1.531611 1.531611
[9] 1.250555 1.250555 1.250555 1.250555 1.531611 1.531611 1.531611 1.531611
[17] 2.166026 2.166026 2.166026 2.166026
$df
[1] 18
$residual.scale
[1] 5.592654
new <- data.frame(x = seq(24,28,1))

new
x
1 24
2 25
3 26
4 27
5 28
predict(lm(y~x),new,interval="prediction", se.fit=T, level=0.95)
$fit
fit lwr upr
1 102.1 89.85545 114.3445
2 103.0 90.81762 115.1824
3 103.9 91.76872 116.0313
4 104.8 92.70862 116.8914
5 105.7 93.63720 117.7628
$se.fit
1 2 3 4 5
1.640088 1.531611 1.436779 1.358451 1.299615
$df
[1] 18
$residual.scale
[1] 5.592654
pred<- predict(lm(y~x),interval="confidence", level=0.95)
pred1<- predict(lm(y~x),interval="prediction", level=0.95)
matplot(x,cbind(pred, pred1[,-1]), lty=c(1,2,2,3,3), col=c(1,2,2,4,4),
type="l", ylab="valores preditos de y")
===============================================================================
EXEMPLO 2 Fonte: Montgomery e Peck (1992).
Y = tempo necessário para um entregador repor e executar pequenos serviços
em máquinas automáticas de vendas de refrigerantes (em minutos)
X = quantidade de volumes repostos nas máquinas (em unidades)
===============================================================================
attach(ex2)
ex2
Y X1
1 16.68 7
2 11.50 3
3 12.03 3
4 14.88 4
5 13.75 6
6 18.11 7
7 8.00 2
8 17.83 7
9 79.24 30
10 21.50 5
11 40.33 16
12 21.00 10
13 13.50 4
14 19.75 6
15 24.00 9
16 29.00 10
17 15.35 6
18 19.00 7
19 9.50 3
20 35.10 17
21 17.90 10
22 52.32 26
23 18.75 9
24 19.83 8
25 10.75 4
plot(X1, Y, pch=16, col=4)

cor(X1,Y)
[1] 0.9646146
cor.test(X1,Y)
data: X1 and Y
t = 17.5455, df = 23, p-value = 8.216e-15
0.9202275 0.9845031
sample estimates:
cor
0.9646146
mod1<-lm(Y~X1)
anova(mod1)

Response: Y
X1 1 5382.4 5382.4 307.85 8.22e-15 ***
Residuals 23 402.1 17.5
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
summary(mod1)
Residuals:
-7.5811 -1.8739 -0.3493 2.1807 10.6342
Coefficients:
(Intercept) 3.321 1.371 2.422 0.0237 *
X1 2.176 0.124 17.546 8.22e-15 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
par(mfrow=c(2,2))
data: mod1$resid
W = 0.9671, p-value = 0.5718
x1<-2:30
ye<-3.321+2.176*x1
lines(x1,ye)
===============================================================================
EXEMPLO 3
Y = tempo necessário para um comerciante estocar uma prateleira da mercearia
com refrigerantes (em minutos)
X = quantidade da mercadoria (em unidades)
Fonte: Montgomery & Peck (1992).
===============================================================================
attach(ex3)
y x
1 10.15 25
2 2.96 6
3 3.00 8
4 6.88 17
5 0.28 2
6 5.06 13
7 9.14 23
8 11.86 30
9 11.69 28
10 6.04 14
11 7.57 19
12 1.74 4
13 9.38 24
14 0.16 1
15 1.84 5
cor(x,y)
[1] 0.9973597
cor.test(x,y)

data: x and y
t = 49.519, df = 13, p-value = 4.441e-16
0.9918363 0.9991477
sample estimates:
cor
0.9973597
mod1<-lm(y~x)
anova(mod1)

Response: y
x 1 228.318 228.318 2452.1 3.399e-16 ***
Residuals 13 1.210 0.093
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
summary(mod1)
Residuals:
-0.4405 -0.1582 -0.1018 0.1357 0.6111
Coefficients:
(Intercept) -0.093756 0.143577 -0.653 0.525
x 0.407107 0.008221 49.519 3.4e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

F-statistic: 2452 on 1 and 13 DF, p-value: 3.399e-16
par(mfrow=c(2,2))

W = 0.9321, p-value = 0.2937

x1<-1:30
ye<- -0.093756+0.407107*x1
lines(x1,ye)
==========================================================
# limite inferior da variação de x muito próxima de zero
# e faz sentido que quando x = 0 se tenha y = 0
==========================================================
mod2<-lm(y~-1 + x) # modelo sem intercepto

summary(mod2)
Call:
lm(formula = y ~ -1 + x)
Residuals:
-0.5252 -0.2198 -0.1202 0.1070 0.5443
Coefficients:
x 0.402619 0.004418 91.13 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

F-statistic: 8305 on 1 and 14 DF, p-value: < 2.2e-16
anova(mod2)

Response: y
x 1 741.62 741.62 8305.2 < 2.2e-16 ***
Residuals 14 1.25 0.09
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
par(mfrow=c(2,2))
data: mod2$resid
W = 0.929, p-value = 0.2639
par(mfrow=c(1,1))
x1<-min(x):max(x)
ye<-mod2$coef[1]*x1
lines(x1,ye)
ye1<-mod1$coef[1]+mod1$coef[2]*x1
lines(x1,ye1, lty=4)
===============================================================================
EXEMPLO 4
Y = Renda média mensal sobre vendas de comida em restaurantes (em mil dólares)
X = Despesas anual com propagandas (em mil dólares)
Fonte: Montgomery e Peck (1992)
===============================================================================
rm(list = ls())
attach(ex4)
ex4
y x
1 81.464 3.000
2 72.661 3.150
3 72.344 3.085
4 90.743 5.225
5 98.588 5.350
6 96.507 6.090
7 126.574 8.925
8 114.133 9.015
9 115.814 8.885
10 123.181 8.950
11 131.434 9.000
12 140.564 11.345
13 151.352 12.275
14 146.926 12.400
15 130.963 12.525
16 144.630 12.310
17 147.041 13.700
18 179.021 15.000
19 166.200 15.175
20 180.732 14.995
21 178.187 15.050
22 185.304 15.200
23 155.931 15.150
24 172.579 16.800
25 188.851 16.500
26 192.424 17.830
27 203.112 19.500
28 192.482 19.200
29 218.715 19.000
30 214.317 19.350

cor(x,y)
[1] 0.9777232
cor.test(x,y)
data: x and y
t = 24.6482, df = 28, p-value < 2.2e-16
0.9532193 0.9894611
sample estimates:
cor
0.9777232
mod1<-lm(y~x)
anova(mod1)

Response: y
x 1 49200 49200 607.53 < 2.2e-16 ***
Residuals 28 2268 81
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
summary(mod1)
Call:
lm(formula = y ~ x)
Residuals:
-19.2871 -4.8273 -0.6383 7.3630 16.3512
Coefficients:
(Intercept) 49.4434 4.2889 11.53 3.81e-12 ***
x 8.0484 0.3265 24.65 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

F-statistic: 607.5 on 1 and 28 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
data: mod1$resid
W = 0.9774, p-value = 0.7523
x1<-3:20
ye<-49.4434+8.0484*x1
lines(x1,ye,type="l")
===============================================================================
EXEMPLO 5: Dados: trees (Fonte: Disponível no R)
Y = diâmetro de árvores (em polegadas)
X = volume de madeira (em pés cúbicos)
===============================================================================
data(trees)
help(trees)
attach(trees)
trees
Girth Height Volume

1 8.3 70 10.3
2 8.6 65 10.3
3 8.8 63 10.2
4 10.5 72 16.4
5 10.7 81 18.8
6 10.8 83 19.7
7 11.0 66 15.6
8 11.0 75 18.2
9 11.1 80 22.6
10 11.2 75 19.9
11 11.3 79 24.2
12 11.4 76 21.0
13 11.4 76 21.4
14 11.7 69 21.3
15 12.0 75 19.1
16 12.9 74 22.2
17 12.9 85 33.8
18 13.3 86 27.4
19 13.7 71 25.7
20 13.8 64 24.9
21 14.0 78 34.5
22 14.2 80 31.7
23 14.5 74 36.3
24 16.0 72 38.3
25 16.3 77 42.6
26 17.3 81 55.4
27 17.5 82 55.7
28 17.9 80 58.3
29 18.0 80 51.5
30 18.0 80 51.0
31 20.6 87 77.0
plot(Girth,Volume)
cor(Girth,Volume)
[1] 0.9671194
cor.test(Girth,Volume)
data: Girth and Volume

t = 20.4783, df = 29, p-value < 2.2e-16
0.9322519 0.9841887
sample estimates:
cor
0.9671194
fit1<-lm(Volume~Girth)
summary(fit1)
Call:
lm(formula = Volume ~ Girth)
Residuals:
-8.0654 -3.1067 0.1520 3.4948 9.5868
Coefficients:
(Intercept) -36.9435 3.3651 -10.98 7.62e-12 ***
Girth 5.0659 0.2474 20.48 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

anova(fit1)

Response: Volume
Girth 1 7581.8 7581.8 419.36 < 2.2e-16 ***
Residuals 29 524.3 18.1
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> shapiro.test(fit1$res)

data: fit1$res
W = 0.9789, p-value = 0.7811
par(mfrow=c(2,2))
plot(fit1, which=c(1:4), add.smooth=FALSE, pch=20)
Como podemos tentar melhorar esse modelo?

################################################
TRANSFORMAÇÕES EM Y
################################################
Exemplo2 – Transformação Raiz Quadrado em Y
attach(ex2)
mod2<-lm(sqrt(Y)~X1)
par(mfrow=c(2,2))
plot(mod2$fitted.values, mod2$resid, pch=16)

summary(mod2)
Residuals:
-0.59277 -0.25560 0.01804 0.22476 0.81350
Coefficients:
(Intercept) 2.87028 0.11886 24.15 < 2e-16 ***
X1 0.19061 0.01075 17.73 6.6e-15 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
data: mod2$resid
W = 0.9794, p-value = 0.8725
require(MASS)
boxcox(Y~X1,data=ex2,plotit=T)
boxcox(Y~X1,data=ex2,lam=seq(-0.5,1.5,1/10))
boxcox(Y~X1,data=ex2,plotit=F)
$x
[1] -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7 -0.6
[16] -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
[31] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
$y
[1] -96.83566 -95.53286 -94.24200 -92.96237 -91.69314 -90.43331
[7] -89.18172 -87.93706 -86.69782 -85.46232 -84.22875 -82.99519
[13] -81.75969 -80.52043 -79.27594 -78.02549 -76.76964 -75.51119
[19] -74.25647 -73.01738 -71.81411 -70.67886 -69.66004 -68.82601
[25] -68.26562 -68.08184 -68.37491 -69.21697 -70.62856 -72.57098
[31] -74.95981 -77.69050 -80.66194 -83.79002 -87.01158 -90.28253
[37] -93.57378 -96.86716 -100.15195 -103.42240 -106.67591
###############################################################
Exemplo 4 – Transformações em Y
###############################################################
rm(list = ls())
attach(ex4)
mod1<-lm(y~x)
par(mfrow=c(2,2))
mod2<-lm(sqrt(y)~x)
par(mfrow=c(2,2))
mod3<-lm(log(y)~x)
par(mfrow=c(2,2))
par(mfrow=c(1,1))
require(MASS)
boxcox(y~x,data=ex4,plotit=T)
boxcox(y~x,data=ex4,lam=seq(-0.5,1.5,1/10))
par(mfrow=c(2,2))
mod4<-lm((y^0.6)~x)
W = 0.9774, p-value = 0.7523
W = 0.9612, p-value = 0.3327
W = 0.9593, p-value = 0.2969
W = 0.9643, p-value = 0.3971
#####################################################
Exemplo 5 – Transformações em Y
#####################################################
data(trees)
attach(trees)
fit2<-lm(sqrt(Volume)~Girth)
summary(fit2)
Residuals:
-0.56640 -0.19429 -0.01169 0.20934 0.65575
Coefficients:
(Intercept) -0.55183 0.23719 -2.327 0.0272 *
Girth 0.44262 0.01744 25.385 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

anova(fit2)

Response: sqrt(Volume)
Girth 1 57.881 57.881 644.42 < 2.2e-16 ***
Residuals 29 2.605 0.090
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
shapiro.test(fit2$res)

data: fit2$res
W = 0.9858, p-value = 0.9457
par(mfrow=c(2,2))
require(MASS)
boxcox(Volume~Girth,data=trees,plotit=T)
boxcox(Volume~Girth,data=trees,lam=seq(-0.5,1.5,1/10))
bc<-boxcox(Volume~Girth,data=trees,plotit=F)
cbind(bc$x,bc$y)
[,1] [,2]
[1,] -2.0 -137.69639
[2,] -1.9 -135.43059
[3,] -1.8 -133.16051
[4,] -1.7 -130.88450
[5,] -1.6 -128.60076
[6,] -1.5 -126.30727
[7,] -1.4 -124.00186
[8,] -1.3 -121.68223
[9,] -1.2 -119.34600
[10,] -1.1 -116.99079
[11,] -1.0 -114.61444
[12,] -0.9 -112.21525
[13,] -0.8 -109.79244
[14,] -0.7 -107.34678
[15,] -0.6 -104.88164
[16,] -0.5 -102.40450
[17,] -0.4 -99.92915
[18,] -0.3 -97.47883
[19,] -0.2 -95.09024
[20,] -0.1 -92.81840
[21,] 0.0 -90.74102
[22,] 0.1 -88.95991
[23,] 0.2 -87.59521
[24,] 0.3 -86.76836
[25,] 0.4 -86.57447
[26,] 0.5 -87.05365
[27,] 0.6 -88.17730
[28,] 0.7 -89.85812
[29,] 0.8 -91.97739
[30,] 0.9 -94.41372
[31,] 1.0 -97.06207
[32,] 1.1 -99.84112
[33,] 1.2 -102.69275
[34,] 1.3 -105.57791
[35,] 1.4 -108.47175
[36,] 1.5 -111.35941
[37,] 1.6 -114.23277
[38,] 1.7 -117.08805
[39,] 1.8 -119.92427
[40,] 1.9 -122.74207
[41,] 2.0 -125.54309
fit3<-lm((Volume^0.4)~Girth)
par(mfrow=c(2,2))
#####################################################
Exemplo 6 – Linearização
Y = número médio de bactérias sobreviventes
em um produto alimentício enlatado
X = minutos de exposição a 300º F
Fonte: Montgomery & Peck (1992).
#####################################################
y<-c(175,108,95,82,71,50,49,31,28,17,16,11)
x<-1:12
cbind(y,x)
y x
[1,] 175 1
[2,] 108 2
[3,] 95 3
[4,] 82 4
[5,] 71 5
[6,] 50 6
[7,] 49 7
[8,] 31 8
[9,] 28 9
[10,] 17 10
[11,] 16 11
[12,] 11 12

cor.test(x,y)
yt<-log(y)
plot(x,yt, pch=16, col=2)
cor.test(x,yt)
data: x and yt
t = -23.4591, df = 10, p-value = 4.489e-10
-0.9975654 -0.9672859
sample estimates:
cor
-0.9910365
mod1<-lm(yt~x)
summary(mod1)
Call:
lm(formula = yt ~ x)
Residuals:
-0.184303 -0.083994 0.001453 0.072825 0.206246
Coefficients:
(Intercept) 5.33878 0.07409 72.05 6.47e-15 ***
x -0.23617 0.01007 -23.46 4.49e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

anova(mod1)
Response: yt
x 1 7.9761 7.9761 550.33 4.489e-10 ***
Residuals 10 0.1449 0.0145
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
par(mfrow=c(2,2))
data: mod1$resid
W = 0.9852, p-value = 0.9968
par(mfrow=c(1,2))
xe<-min(x):max(x)
yte<-mod1$coef[1] + mod1$coef[2]*xe
plot(x, yt, pch=16, col=4, ylab="ln(y)")
lines(xe, yte)
plot(x, y, pch=16, col=4, ylab="ln(y)")
ye<-exp(mod1$coef[1])*exp(mod1$coef[2]*xe)
lines(xe, ye)
##############################################################
Exemplo 7 – Transformação em X
Y = energia elétrica gerada por um moinho de vento
X = velocidade do vento (em MPH)
Fonte: Montgomery e Peck, 1992.
##############################################################
rm(list = ls())
attach(ex7)
plot(x,y, pch=16)
## transformacao em x: procedimento de Box-Tidwell
## chute inicial: alpha0 = 1
mod1<-lm(y~x)
summary(mod1)
Residuals:
-0.59869 -0.14099 0.06059 0.17262 0.32184
Coefficients:
(Intercept) 0.13088 0.12599 1.039 0.31
x 0.24115 0.01905 12.659 7.55e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

w<-x*log(x)
mod2<-lm(y~x+w)
summary(mod2)
Residuals:
-0.223004 -0.029442 0.009955 0.048323 0.181553
Coefficients:
(Intercept) -2.41684 0.28512 -8.477 2.23e-08 ***
x 1.53443 0.14189 10.814 2.85e-10 ***
w -0.46260 0.05065 -9.132 6.13e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

alpha1<-(mod2$coef[3]/mod1$coef[2]) + 1
alpha1
-0.918302
x1<-x^alpha1
mod11<-lm(y~x1)
summary(mod11)
Residuals:
-0.19539 -0.06179 0.01245 0.07813 0.12392
Coefficients:
(Intercept) 3.10386 0.04748 65.38 <2e-16 ***
x1 -6.67842 0.19538 -34.18 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

w1<-x1*log(x1)
mod22<-lm(y~x1+w1)
summary(mod22)
Call:
lm(formula = y ~ x1 + w1)
Residuals:
-0.18507 -0.07049 0.01960 0.06727 0.12975
Coefficients:
(Intercept) 3.2409 0.2708 11.968 4.18e-11 ***
x1 -6.4445 0.4962 -12.987 8.58e-12 ***
w1 0.5994 1.1652 0.514 0.612
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

alpha2<-(mod22$coef[3]/mod11$coef[2]) + alpha1
alpha2
-1.008055
# transformacao indicada: x-1 = 1/x
xt<-1/x
plot(xt,y, pch=16,col=2)
mod3<-lm(y~xt)
summary(mod3)
Residuals:
-0.20547 -0.04941 0.01100 0.08352 0.12204
Coefficients:
(Intercept) 2.9789 0.0449 66.34 <2e-16 ***
xt -6.9345 0.2064 -33.59 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

par(mfrow=c(2,2))

data: mod3$resid
W = 0.9292, p-value = 0.0835
par(mfrow=c(1,1))
plot(xt,y, pch=16, col=4, xlab="1/x")
xe<-seq(0,0.41,by=0.05)
ye<-mod3$coef[1]+mod3$coef[2]*xe
lines(xe,ye)
Exemplo 4: Mínimos Quadrados Ponderados
rm(list = ls())
attach(ex4)
mod1<-lm(y~x)
par(mfrow=c(1,2))
plot(x, mod1$resid, pch = 20)
plot(mod1$fitted, mod1$resid, pch = 20)
m1<-mean(x[1:3])
s1<-var(y[1:3])
m2<-mean(x[4:5])
s2<-var(y[4:5])
m3<-mean(x[7:11])
s3<-var(y[7:11])
m4<-mean(x[13:16])
s4<-var(y[13:16])
m5<-mean(x[18:23])
s5<-var(y[18:23])
m6<-mean(x[24:25])
s6<-var(y[24:25])
m7<-mean(x[27:30])
s7<-var(y[27:30])
medias<-as.vector(c(m1,m2,m3,m4,m5,m6,m7))
s2<-as.vector(c(s1,s2,s3,s4,s5,s6,s7))
medias
[1] 3.078333 5.287500 8.955000 12.377500 15.095000 16.650000 19.262500
s2
[1] 26.79462 30.77201 52.80369 77.28016 120.57106 132.38899 138.85687
plot(medias, s2, pch=16, col=4)
lm(s2~medias)
Coefficients:
(Intercept) medias
-7.376 7.820
aux<--7.376+7.820*x
wi<-1/aux
wi
[1] 0.062173589 0.057947500 0.059706126 0.029865456 0.029018311 0.024846079
[7] 0.016021148 0.015842513 0.016101841 0.015971124 0.015872008 0.012293787
[13] 0.011284835 0.011161711 0.011041245 0.011250087 0.010024259 0.009097194
[19] 0.008985331 0.009100431 0.009064950 0.008969575 0.009001143 0.008064516
[25] 0.008220034 0.007572625 0.006891134 0.007004371 0.007081952 0.006947291
mod3<-lm(y~x,weights=wi)
summary(mod3)
Residuals:
-2.0214 -0.7388 -0.1134 0.8181 1.6763
Coefficients:
(Intercept) 50.9746 2.5073 20.33 <2e-16 ***
x 7.9222 0.2532 31.28 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

anova(mod3)

Response: y
x 1 905.50 905.50 978.63 < 2.2e-16 ***
Residuals 28 25.91 0.93
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
rqwi<-sqrt(wi)
we<-rqwi*mod3$resid
wx<-rqwi*x
wy<-rqwi*mod3$fitted.values
plot(wy,we, pch=16)
mod1<-lm(y~x)
mod2<-lm(sqrt(y)~x)
mod3<-lm(y~x,weights=wi)
par(mfrow=c(1,3))
plot(mod1$fitted.values,mod1$resid)
plot(mod2$fitted.values,mod2$resid)
plot(wy,we)
par(mfrow=c(1,2))
plot(x,sqrt(y), pch=16,col=2)
xe<- min(x):max(x)
ye2<- mod2$coef[1]+mod2$coef[2]*xe
ye3<- mod3$coef[1]+mod3$coef[2]*xe
lines(xe,ye2)
title("y^0.5 = 7.8079 + 0.3454*x")
plot(x,y, pch=16,col=2)
lines(xe,ye3)
title("y = 50.975 + 7.922*x")

ComandosR RLS

Uploaded by

Copyright:

Available Formats

ComandosR RLS

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ComandosR RLS

Uploaded by

Copyright:

Available Formats

##############################################################################

CE071: ANALISE DE REGRESSAO LINEAR PROFa: SUELY RUIZ GIOLO

CODIGOS R VERSAO 2.10

Pearson's product-moment correlation

Analysis of Variance Table

Residual standard error: 5.593 on 18 degrees of freedom

Bartlett test of homogeneity of variances

predict(lm(y~x),interval="confidence", se.fit=T, level=0.95)

new <- data.frame(x = seq(24,28,1))

predict(lm(y~x),new,interval="prediction", se.fit=T, level=0.95)

plot(X1, Y, pch=16, col=4)

Analysis of Variance Table

Shapiro-Wilk normality test

plot(x,y, pch=16, col=4)

Pearson's product-moment correlation

Analysis of Variance Table

Residual standard error: 0.3051 on 13 degrees of freedom

Shapiro-Wilk normality test

plot(x,y, pch=16, col=4)

mod2<-lm(y~-1 + x) # modelo sem intercepto

Residual standard error: 0.2988 on 14 degrees of freedom

Analysis of Variance Table

Shapiro-Wilk normality test

plot(x,y, pch=16, col=4)

Pearson's product-moment correlation

Analysis of Variance Table

Residual standard error: 8.999 on 28 degrees of freedom

Girth Height Volume

Pearson's product-moment correlation

data: Girth and Volume

Residual standard error: 4.252 on 29 degrees of freedom

Analysis of Variance Table

Shapiro-Wilk normality test

Como podemos tentar melhorar esse modelo?

Exemplo2 – Transformação Raiz Quadrado em Y

plot(mod2$fitted.values, mod2$resid, pch=16)

Shapiro-Wilk normality test

Residual standard error: 0.2997 on 29 degrees of freedom

Analysis of Variance Table

Shapiro-Wilk normality test

plot(x,y, pch=16, col=2)

Pearson's product-moment correlation

Residual standard error: 0.1204 on 10 degrees of freedom

Shapiro-Wilk normality test

Residual standard error: 0.2361 on 23 degrees of freedom

Residual standard error: 0.1103 on 22 degrees of freedom

Residual standard error: 0.09258 on 23 degrees of freedom

Residual standard error: 0.0941 on 22 degrees of freedom

# transformacao indicada: x-1 = 1/x

Residual standard error: 0.09417 on 23 degrees of freedom

Shapiro-Wilk normality test

plot(medias, s2, pch=16, col=4)

Residual standard error: 0.9619 on 28 degrees of freedom

Analysis of Variance Table

You might also like