Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
13 views

Logistic Regression

Logistic Regression

Uploaded by

Anaswara K U
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Logistic Regression

Logistic Regression

Uploaded by

Anaswara K U
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

8.3.

7 Logistic Regression

Logistic regression is both classification and regression


technique depending on the scenario used. Logistic regression
(logit regression) is a type of regression analysis used for
predicting the outcome of a categorical dependent variable
similar to OLS regression. In logistic regression, dependent
variable (Y) is binary (0,1) and independent variables (X) are
continuous in nature. The probabilities describing the possible
outcomes (probability that Y = 1) of a single trial are modelled
as a logistic function of the predictor variables. In the logistic
regression model, there is no R2 to gauge the fit of the overall
model; however, a chi-square test is used to gauge how well
the logistic regression model fits the data. The goal of logistic
regression is to predict the likelihood that Y is equal to 1
(probability that Y = 1 rather than 0) given certain values of X.
That is, if X and Y have a strong positive linear relationship,
the probability that a person will have a score of Y = 1 will
increase as values of X increase. So, we are predicting
probabilities rather than the scores of the dependent variable.

For example, we might try to predict whether or not a small


project will succeed or fail on the basis of the number of years
of experience of the project manager handling the project. We
presume that those project managers who have been managing
projects for many years will be more likely to succeed. This
means that as X (the number of years of experience of project
manager) increases, the probability that Y will be equal to 1
(success of the new project) will tend to increase. If we take a
hypothetical example in which 60 already executed projects
were studied and the years of experience of project managers
ranges from 0 to 20 years, we could represent this tendency to
increase the probability that Y = 1 with a graph.
To illustrate this, it is convenient to segregate years of
experience into categories (i.e. 0–8, 9–16, 17–24, 25–32, 33–
40). If we compute the mean score on Y (averaging the 0s and
1s) for each category of years of experience, we will get
something like

When the graph is drawn for the above values of X and Y, it


appears like the graph in Figure 8.18. As X increases, the
probability that Y = 1 increases. In other words, when the
project manager has more years of experience, a larger
percentage of projects succeed. A perfect relationship
represents a perfectly curved S rather than a straight line, as
was the case in OLS regression. So, to model this relationship,
we need some fancy algebra / mathematics that accounts for
the bends in the curve.

An explanation of logistic regression begins with an


explanation of the logistic function, which always takes values
between zero and one. The logistic formulae are stated in
terms of the probability that Y = 1, which is referred to as P.
The probability that Y is 0 is 1 − P.
FIG. 8.18 Logistic regression

The ‘ln’ symbol refers to a natural logarithm and a + bX is


the regression line equation. Probability (P) can also be
computed from the regression equation. So, if we know the
regression equation, we could, theoretically, calculate the
expected probability that Y = 1 for a given value of X.

‘exp’ is the exponent function, which is sometimes also


written as e.

Let us say we have a model that can predict whether a


person is male or female on the basis of their height. Given a
height of 150 cm, we need to predict whether the person is
male or female.

We know that the coefficients of a = −100 and b = 0.6.


Using the above equation, we can calculate the probability of
male given a height of 150 cm or more formally P(male|height
= 150).

y = e^(a + b × X)/(1 + e^(a + b × X))


y = exp ( −100 + 0.6 × 150)/(1 + EXP( −100 + 0.6 × X)
y = 0.000046

or a probability of near zero that the person is a male.

Assumptions in logistic regression

The following assumptions must hold when building a logistic


regression model:

There exists a linear relationship between logit function and independent


variables
The dependent variable Y must be categorical (1/0) and take binary value,
e.g. if pass then Y = 1; else Y = 0
The data meets the ‘iid’ criterion, i.e. the error terms, ε, are independent from
one another and identically distributed
The error term follows a binomial distribution [n, p]
n = # of records in the data
p = probability of success (pass, responder)

8.3.8 Maximum Likelihood Estimation

The coefficients in a logistic regression are estimated using a


process called Maximum Likelihood Estimation (MLE). First,
let us understand what is likelihood function before moving to

You might also like