Logistic Regression
Logistic Regression
Khairul Alam
Professor, EEE Department
East West University
Aftabnagar, Dhaka, Bangladesh
1
Logistic Regression
Logistic Regression is commonly used to estimate the probability that an instance belongs to a
particular class. Example: what is the probability that this email is spam?
If the estimated probability is greater than a threshold value, then the model predicts that the instance
belongs to that class, or else it predicts that it does not.
It is important that we have to convert the model predicted values to probability. For this, logistic
regression uses the sigmoidal function to determine the probability. The mathematical and graphical
representation of the sigmoidal function is as follows.
1
σ (t)=
1+e−t
2
Logistic Regression
If the threshold value is 0.5, then it is very easy to classify the instance. If t is positive then
the instance belongs to the class otherwise not.
Linear regression is used to predict the continuous Logistic regression is used to predict the categorical
1 dependent variable using a given set of dependent variable using a given set of independent
independent variables. variables.
Linear regression is used for solving Regression
2 It is used for solving classification problems.
problem.
3 In this we predict the value of continuous variables. In this we predict values of categorical varibles.
4 In this we find best fit line. In this we find S-Curve .
The output must be continuous value,such as Output is must be categorical value such as 0 or 1, Yes
5
price,age,etc. or no, etc.
3
Logistic Regression
For illustration let us look at the following table that shows the number of hours each
student spent studying, and whether they passed (1) or failed (0).
5
Gradient of Cost Function
Gradient of the cost function of logistic regression.
m
1
L(θ )=− ∑ [ y i log (σ i )+(1− y i ) log (1−σ i )] zi =θ 0 +θ 1 x i
m i=1 1
∂L 1
m p i =σ i =σ ( z i )= −z
∂θ 0 m ∑
= [− y i (1−σ i )+(1− y i ) σ i ] 1+e i
∂ σ =σ ( z )[1−σ ( z )] ∂ z i =σ (1−σ )
i=1
m
∂L 1 ∂θ 0 i i ∂θ 0 i i
∂θ 1 m ∑
= [− y i (1−σ i )+(1− y i ) σ i ] x i
i=1
m ∂ σ =σ ( z )[1−σ (z )] ∂ z i =σ (1−σ ) x
∂L 1 ∂θ 1 i i ∂θ i i i
∂θ 1 m ∑
= [− y i +σ i ] x i 1
i=1 Gradient in matrix form
1 T
J (θ )= X (−Y +σ ( X θ ))
m
6
Logistic Regression Algorithm
40 1 lr.coef_
Out[9]: array([[0.64318441]])
ln ( )
p
1− p
= z=−15.42+0.643∗x
8
Logistic Regression Example
Question #1: Find probability of pass of a student who studies 30 hours xnew = np.array([30]).reshape(1,1)
log_reg.predict(xnew)
p
( )
Question#2 Minimum hours of study for a
ln =z=−15.42+0.643∗x probability of 0.6 log_reg.predict_proba(xnew)
1− p
( )
p log_reg.score(x,y)
ln
p
( )
1− p
=−15.42+0.643∗30
z=ln
1− p z=−15.42+0.643∗x
x=(0.405+15.42)/ 0.643
=ln ( )
0.6
x=24.6 hours
ln
p
( )
1− p
=3.8699
1−0.6
=0.405
1
p=
1+e−3.8699
p=0.979
9
Logistic Regression Example
import numpy as np
from sklearn.linear_model import LogisticRegression
import matplotlib.pyplot as plt
# x-data
x = [0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00, 5.0, 3.50, 4.00, 4.25, 4.50, 4.75, 5.00, 5.50]
x = np.array(x).reshape(len(x), 1)
# y-data
y = np.array([0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1])
10
Logistic Regression Gradient
Descent
import numpy as np
import matplotlib.pyplot as plt
# x-data
x = [0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00, 5.0, 3.50, 4.00, 4.25, 4.50, 4.75, 5.00, 5.50]
one = np.ones((len(x), 1))
x = np.c_[one, x]
# y-data
y = np.array([0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1])
y = y.reshape(len(y), 1)
11
Confusion Matrix
A Confusion matrix is an k × k matrix used for evaluating the performance of a
classification model, where k is the number of target classes. The matrix compares the
actual target values with those predicted by the machine learning model. By definition a
confusion matrix C is such that Cij is equal to the number of observations known to be in
group i and predicted to be in group j.
12
Confusion Matrix
Precision is defined as the ratio of the total number
of correctly classified positive classes divided by the
total number of predicted positive classes. Or, out of
all the predictive positive classes, how much we
predicted correctly.
13
Confusion Matrix
When to use Accuracy / Precision / Recall /
F1-Score?
15
Softmax Regression
The Logistic Regression model can be generalized to support multiple classes directly,
without having to train and combine multiple binary classifiers. This is called Softmax
Regression, or Multinomial Logistic Regression.
Similar to the sigmoid function of logistic regression, the softmax function is used in
softmax regression to determine the probability.
exp( z i )
s( z i )= K
∑ exp( z j )
j=1
16
Softmax Regression – Developing
the concept
To develop the understanding of softmax regression, we will consider 3 features (n = 3),
each feature has a length of 6 (m = 6), and there are 3 classes (K = 3). The output y = 1,
2, 3 means RED, GREEN, and BLUE. The system in matrix form is as follows. The
notation is xm,n.
[ ]
1 x 11 x 12 x13 We decompose Y Now use logistic Each θ has 4
X= 1 x 21 x 22 x23 into 3 classes, regression components (0,1,2,3).
⋮ ⋮ ⋮ ⋮ red, green, and algorithm to find θ Put them in each
1 x 61 x 62 x63 blue for each class column to get (n+1 x
()
1
K) matrix.
() () () [ ]
0 0 θ =logreg . fit ( X , y )
3 1 θ0 θ0 θ0
0 0 1 θ =logreg . fit ( X , y )
Y=
2
Y=
0 Y=
1 Y=
0
θ =logreg . fit ( X , y ) θ= θ1 θ1 θ1
2 0 1 0 θ2 θ2 θ2
0 1
3 0
0
θ3 θ3 θ3
1 0
1
17
Softmax Regression – Developing
the concept
Our model is ready. Now for prediction, assume a new value xnew = [x11, x12, x13]. The
steps to follow (1) compute z1, z2, z3. (2) use softmax function on z to find p1, p2, p3. (3)
pick up the maximum from p, and that is the output y. Example, say p2 is maximum, the
output class is GREEN.
[ ]
θ0 θ0 θ0 exp( z 1 )
θ θ1 θ1 p1 =
( z1 z 2 z 3 ) = ( 1 x 11 x 12 x 13 ) 1 exp( z 1)+exp (z 2 )+exp( z 3 ) y=max ( p1 , p2 , p 3 )
θ2 θ2 θ2
exp( z 2 )
θ3 θ3 θ3 p2 =
exp (z 1 )+exp ( z 2 )+exp (z 3 )
Z= xnew θ exp( z 3 )
p3 =
exp (z 1 )+exp( z 2 )+exp (z 3)
18
Softmax Regression – Gradient
Descent
We can easily extend the logistic regression method by doing some extra work as
follows. (1) keep the X matrix as it is. (2) decompose the y vector to Y matrix of size
m×K. (3) extend θ from a column vector to a (n+1)×K matrix. (4) use the same matrix
form formula of Jacobian as we did in logistic regression.
Gradient Descent Algorithm
[ ]
matrix
( ) ( ) ( )
1 x11 x12 x13
()
1 0 0
[ ]
1
X = 1 x21 x22 x23
3 0 0 1 θ0 θ0 θ0 Update weight
⋮ ⋮ ⋮ ⋮ 0 1 0 θ1 θ1 θ1
Y=
2 Y= Y= Y= & continue until
1 x61 x62 x63 0 1 0 θ2 θ2 θ2
2
1 θ3 θ3 θ3 convergency
3 0 0
1 0 0 θ new =θold + η∗J
1
19
Softmax Regression – Example
problem
Below is the height vs cat(1), dog(2), and lion(3) data. We will use gradient descent
algorithm to develop softmax regression model.
Height
5 7 8 9 11 12 13 18 30 32 34 40
[inch]
Animal 1 1 1 1 2 2 2 2 3 3 3 3
[] [ ]
1 5
1 7 1 0 0 m=12
1 8 1 0 0 n=1
1 9 ⋮ ⋮ ⋮ k =3
X= ⋮
1
⋮
30
Y= 0
0
⋮
1
1
⋮
0
0
⋮
[
θ = θ 01 θ 02 θ 03
θ 11 θ 12 θ 13 ] T
J (θ )=X (Y −σ ( X θ ))
X=[m×n +1]=[12×2]
1 32 Y =[m×k ]=[12×3 ]
1 34 0 0 1 θ =[n+1×k ]=[2×3 ]
1 40 0 0 1
20
Softmax Regression – Example
problem
Now use scikit learn to find the model parameters
import numpy as np
from sklearn.linear_model import LogisticRegression
θ=[ 12.1 0.127 −12.23
−0.972 0.225 0.746 ]
x new =10
# dimension
m = 12; n = 1; k = 3 z= [ 1 10 ] [
12.1 0.127 −12.23
−0.972 0.225 0.746 ]
#data z= [ 2.38 2.377 −4.77 ]
x1 = [5, 7, 8, 9, 11, 12, 13, 18, 30, 32, 34, 40]
e = [ 10.81 10.77 0.0085 ]
z
X = np.reshape(x1, (m, n))
Y = np.array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]) 10.81
p1 = =0.500
10.81+10.77+0.0085
soft_reg = LogisticRegression(multi_class = "multinomial")
10.77
soft_reg.fit(X, Y) p2 = =0.499
10.81+10.77+0.0085
soft_reg.intercept_ 0.0085
p3 = =0.001
soft_reg.coef_ 10.81+10.77 +0.0085
21
Softmax Regression – Example
problem
Suppose you have built a device that can detect the 6 digits of the car’s license plate number
through multi-class classification. There are 10 possible classes (0-9) for each of the 6 digits.
The probability of these classes for the 6 digits of a car is given as follows. Determine the
license plate number of the car.
23
Confusion Matrix for Multi-class
Similarly, for class-2, the converted one-vs-all confusion matrix will look like the following:
Using this concept, we can calculate
the class-wise accuracy, precision,
recall, and f1-scores and tabulate
the results:
24
Confusion Matrix – Scikit Learn
The sklearn library has a confision matrix class, which you should import and work on it.
sklearn.metrics.confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None, normalize=None)