Logistic Regression
Logistic Regression
Logistic regression is a statistical analysis method used to predict a data value based on prior
observations of a data set. Logistic regression has become an important tool in the discipline
of machine learning. The approach allows an algorithm being used in a machine learning
application to classify incoming data based on historical data. A logistic regression model
predicts a dependent data variable by analyzing the relationship between one or more existing
independent variables. For example, a logistic regression can be used to predict whether a
political candidate will win or lose an election or whether a high school student will be
admitted to a particular college. The purpose of logistic regression is to estimate the
probabilities of events, including determining a relationship between features and the
probabilities of particular outcomes.Multinomial logistic regression can be used to classify
subjects into groups based on a categorical range of variables to predict behavior.
Binary logistic regression is most useful when we want to model the event probability for a
categorical response variable with two outcomes.
p
=β 0 + β 1 x
(1− p)
We know that odds can always be positive which means the range will always be (0,+∞ ).
Odds are nothing but the ratio of the probability of success and probability of failure. The
problem here is that the range is restricted and we don’t want a restricted range because if we
do so then correlation will decrease. By restricting the range we are actually decreasing the
number of data points and of course, if we decrease our data points, correlation will decrease.
It is difficult to model a variable that has a restricted range. To control this we take the log of
odds which has a range from (-∞,+∞).
Now we just want a function of P because we want to predict probability right? not log of
odds. To do so we will multiply by exponent on both sides and then solve for P.
p
ln [ ]
e 1− p
=e ( β¿ ¿ 0+ β x)¿
1
p ( β ¿ 0+ β x)¿
=e ¿ 1
(1− p)
( β¿ ¿0 + β1 x) ¿
p= p ¿
( β¿ ¿ 0+ β1 x)
e (β ¿¿ 0+ β1 x )¿
1= −e ¿
p
( β¿ ¿ 0+ β1 x)
e (β ¿¿ 0+ β1 x )¿
1= −e ¿
p
( β ¿ 0+ β1 x)
e ¿
p= ¿
1+e ( β¿¿ 0+ β x) ¿ 1
1
p= − (β ¿¿ 0+ β1 x)
1+e ¿
Now we have our logistic function, also called a sigmoid function. The graph of a sigmoid
function is as shown below. It squeezes a straight line into an S-curve.
m=
∑ ( x i−x ) ( y i − y)
∑ (x i−x )2
b= y−m∗x
x=independentvariables
x=average of independent variables
y=dependent variables
y=average of depndent variables
Now put x=0.5
y=0.2345∗0.5+¿(-0.15366)
y=0.11725-0.15366
y=−0.0366
Now calculate exp(value of logit) also called odds
( odds)
pro=
(1+ odds)
0.96406
pro=
(1+0.96406)
0.96406
pro=
(1.96406)
0.96406
pro=
( 1.96406 )
pro=0.49085