Logistic Regression
Logistic Regression
Regularization
Regression vs Classification
Having worked on regression problems for a while, we now move on to
classification problems.
When the number of classes is more than 2 the classification problem is termed
Multiclass classification.
y ∊ {0,1}
Let’s use our knowledge of linear regression and build an intuition from that, taking
note that in this case, our output is either 0 or 1.
Hypothesis Representation
Since the target label is either 0 or 1, our hypothesis has to lie in that range;
0 ≤ hθ(x) ≤ 1 (this is what we want)
This function takes in whatever input you pass into it and produces an output
between 0 and 1.
This is now the hypothesis for logistic regression
And,
The hypothesis is ranges between 0 and 1, and can be modelled as the probability
that an output is 1.
Therefore,
And,
Decision Boundary
The hypothesis gives values ranging from 0 to 1. In order to obtain the class of the
prediction, we can establish a threshold in which values greater than the threshold
are rounded up as 1 and lesser values are 0. We can choose 0.5
The logistic/sigmoid function has a very interesting behaviour. When its input is
greater than or equal to 0, its output is greater than or equal to 0.5, and when its
input is less than 0, its output is less than 0.5
The decision boundary is a line that separates the areas y=1 and y=0
Cost Function
What cost function should we use ?