Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
42 views

Ourse Notes Ogistic Egression: Course Notes: Descriptive Statistics Course Notes: Descriptive Statistics

Logistic regression predicts the probability of categorical outcomes, such as yes/no, buy/won't buy, or 1/0. It models the log odds of the probability of an event using independent variables. The logistic regression model calculates the probability of an event using the logistic function, while the logit model takes the log of both sides of this equation. Key metrics for evaluating logistic regression models include the log likelihood, which should be as high as possible, and McFadden's pseudo R-squared, which is favorable between 0.2 and 0.4. Overfitting and underfitting occur if the model does not capture the underlying logic of the data.

Uploaded by

Islamic India
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Ourse Notes Ogistic Egression: Course Notes: Descriptive Statistics Course Notes: Descriptive Statistics

Logistic regression predicts the probability of categorical outcomes, such as yes/no, buy/won't buy, or 1/0. It models the log odds of the probability of an event using independent variables. The logistic regression model calculates the probability of an event using the logistic function, while the logit model takes the log of both sides of this equation. Key metrics for evaluating logistic regression models include the log likelihood, which should be as high as possible, and McFadden's pseudo R-squared, which is favorable between 0.2 and 0.4. Overfitting and underfitting occur if the model does not capture the underlying logic of the data.

Uploaded by

Islamic India
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

COURSE NOTES: LOGISTIC

REGRESSION
Course
Coursenotes:
notes:
Descriptive
Descriptive
statistics
statistics
Logistic regression vs Linear regression

Logistic regression implies that the possible outcomes are not numerical but
rather categorical.

Category 1
Examples for categories are: Logistic
• Yes / No regression
• Will buy / Won’t Buy Category 2
• 1/0

Linear regression model: Y=β0​+β1X1​+…+βkXk+ε

𝑒 (β0+β1X1+…+βkXk)
Logistic regression model: 𝑝(X) =
1+𝑒 (β0+β1X1+…+βkXk)
Logistic model

The logistic regression predicts the probability of an event occurring.

input probability

Visual representation of a logistic function


Logistic regression model

Logistic regression model 𝑝(X)


ODDS =
𝑝(X) 1 −𝑝(X)
= 𝑒 (β0+β1X1+…+βkXk)
1 −𝑝(X)
Coin flip odds:
The logistic regression model is not very useful The odds of getting heads are 1:1 (or simply 1)
in itself. The right-hand side of the model is an
exponent which is very computationally Fair die odds:
inefficient and generally hard to grasp. The odds of getting 4 are 1:5 (1 to 5)

Logit regression model


When we talk about a ‘logistic regression’ what we
usually mean is ‘logit’ regression – a variation of the
model where we have taken the log of both sides.
𝑝 𝑋
log ( ) =log( 𝑒 (β0+β1𝑥+⋯β𝑘𝑥𝑘) )
1−𝑝 𝑋
𝑝 𝑋
log ( ) = β0 + β1 𝑥 + ⋯ β𝑘 𝑥𝑘
1−𝑝 𝑋

log (o𝐝𝐝𝐬) = 𝜷𝟎 + 𝜷𝟏 𝒙 + ⋯ 𝜷𝒌 𝒙𝒌
Logistic regression model

McFadden’s pseudo-R-squared, used


for comparing variations of the same
The dependent variable, y;
model. Favorable range [0.2,0.4].
This is the variable we are
trying to predict.
Log-Likelihood* (the log of the
likelihood function). Always negative.
Indicates whether our We aim for this to be as high as
model found a solution or possible.
not.
Log-Likelihood-Null is the log-
likelihood of a model which has no
Coefficient of the independent variables. It is used as
intercept, b0; sometimes the benchmark ‘worst’ model.
we refer to this variable as
constant or bias. Log-Likelihood Ratio p-value
measures of our model is statistically
different from the benchmark ‘worst’
Coefficient of the independent variable i: bi; this is usually the most important metric – it shows model.
us the relative/absolute contribution of each independent variable of our model. For a logistic
regression, the coefficient contributes to the log odds and cannot be interpreted directly.

*Likelihood function: a function which measures the goodness of fit of a statistical model.
MLE (Maximum Likelihood Estimation) tries to maximize the likelihood function.
Underfitting Overfitting

The model has not captured the underlying Our training has focused on the particular
logic of the data. training set so much it has “missed the point”.

*Note that when we refer to the population models, we use Greek letters

You might also like