Regression3 Slides
Regression3 Slides
Marco Baroni
Practical Statistics in R
Outline
Logistic regression
Logistic regression in R
Outline
Logistic regression
Introduction
The model
Looking at and comparing fitted models
Logistic regression in R
Outline
Logistic regression
Introduction
The model
Looking at and comparing fitted models
Logistic regression in R
Modeling discrete response variables
Logistic regression
Introduction
The model
Looking at and comparing fitted models
Logistic regression in R
Classic multiple regression
y = β0 + β1 × x1 + β2 × x2 + ... + βn × xn +
5
logit(p)
0
−5
logit(p) = β0 + β1 × x1 + β2 × x2 + ... + βn × xn
I Back to probabilities:
elogit(p)
p=
1 + elogit(p)
I Thus:
eβ0 +β1 ×x1 +β2 ×x2 +...+βn ×xn
p=
1 + eβ0 +β1 ×x1 +β2 ×x2 +...+βn ×xn
From log odds ratios to probabilities
1.0
0.8
0.6
p
0.4
0.2
0.0
−10 −5 0 5 10
logit(p)
Probabilities and responses
1.0
● ● ● ●● ● ● ●
0.8
0.6
p
0.4
0.2
0.0
● ● ● ●● ● ● ●
−10 −5 0 5 10
logit(p)
A subtle point: no error term
I NB:
logit(p) = β0 + β1 × x1 + β2 × x2 + ... + βn × xn
g(E(y )) = X β
g(E(y )) = E(y )
I Given mean, observations are normally distributed with
variance estimated from the data
I This corresponds to the error term with mean 0 in the linear
regression model
Logistic regression as a generalized linear model
g(E(y )) = X β
I “Link” function is :
E(y )
g(E(y )) = log
1 − E(y )
I Given E(y ), i.e., p, observations have a Bernoulli
distribution with variance p(1 − p)
Estimation of logistic regression models
Logistic regression
Introduction
The model
Looking at and comparing fitted models
Logistic regression in R
Interpreting the βs
Logistic regression
Logistic regression in R
Preparing the data and fitting the model
Practice
Outline
Logistic regression
Logistic regression in R
Preparing the data and fitting the model
Practice
Back to the Graffeo et al.’s discount study
Fields in the discount.txt file
> sex_age_pres_prod.glm<-glm(choice~sex+age+
presentation+product,family="binomial")
> summary(sex_age_pres_prod.glm)
Selected lines from the summary() output
> interaction.glm<-glm(choice~sex+age+presentation+
product+sex:presentation,family="binomial")
> anova(sex_age_pres_prod.glm,interaction.glm,
test="Chisq")
...
Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1 1168 1284.25
2 1166 1277.68 2 6.57 0.04
> table(choice)
choice
N Y
363 813
> sum(choice=="N")/length(choice)
[1] 0.3086735
Logistic regression
Logistic regression in R
Preparing the data and fitting the model
Practice
Practice time